Full text search in milliseconds with PostgreSQL

274 pointsby maxnovabout 10 years ago

12 comments

bigbentoabout 10 years ago

To get optimum performance with PostgreSQL full text search you need to create a column to store the tsvector values with an index on it. Then you need to fill this column with the tsv values and create a trigger to update the field on INSERT and UPDATE.I've been playing around with full text search in Postgres, and I took this sort of approach when starting out, but then realized you I could just have the index be an expression.So instead of (per the example)<pre><code> CREATE INDEX tsv_idx ON documents USING gin(tsv); </code></pre> doing something like<pre><code> CREATE INDEX tsv_idx ON documents USING gin(to_tsvector('english', text)); </code></pre> Is there any reason you wouldn't do this? For multi-languange you you'd have to detect the language of the text, but there's no reason you couldn't parameterize that too.

评论 #9514105 未加载

评论 #9513213 未加载

评论 #9513265 未加载

评论 #9513757 未加载

评论 #9515396 未加载

评论 #9513143 未加载

评论 #9513960 未加载

buro9about 10 years ago

> This query takes about 50ms! We experimented with returning the document text with results and found that if we returned the full text of documents, it added about 350ms to the query time which seems to be network overhead rather than a slower query. If we only returned 200 characters from the texts then it only added around 100ms.The way you have written the query, it has to extract the title from all of the JSON documents before applying the LIMIT.Perhaps do the work to get the identifiers, order by rank, and apply the limit... all in a subquery. So that only the outer query actually touches the JSON document and extracts the titles.Try it and report back on the speed improvement.

评论 #9514953 未加载

loopbitabout 10 years ago

Sure, if you only need full text search there's no need to use Solr or ElasticSearch.I've never tried with PostgreSql, but I noticed the same drop in performance in MySql at the 1-2 million documents count, although full text search queries in MySql are slower (from what I remember, I haven't tested in a couple of years).The power of Solr and other tools out there is all the other stuff you can do, like synonyms processing, faceting, range queries, etc, etc... as well as give you more flexibility on what data you load and how you process it. But if you don't need it, there's no reason to use a different tool.

评论 #9513848 未加载

praveensterabout 10 years ago

"The accuracy of PostgreSQL searches is not the best. In the examples that follow I’ve explained how to weight titles over text, but we found a few cases where typing the exact title of a document wouldn’t return that document as the first result."Isn't this a concern as the main objective of search is to provide accurate results?

评论 #9512838 未加载

评论 #9515595 未加载

评论 #9513150 未加载

评论 #9514151 未加载

sbilsteinabout 10 years ago

Nice writeup, I wrote a similar blog post a few years ago: <a href="http://tech.pro/tutorial/1142/building-faceted-search-with-postgresql" rel="nofollow">http://tech.pro/tutorial/1142/building-faceted-search-with-p...</a>Mine covers tags and other aspects of faceted search as well.We were able to build a really simple search facility for our marketplace using Postgres in a very short period of time and for cheap. It was incredibly helpful for a small startup short on people and resources.

评论 #9514147 未加载

评论 #9513503 未加载

krastanovabout 10 years ago

Naive question from a person with little DB experience: Why store the metadata in JSON inside the database instead of in multiple columns in the same table? Is this not the point of using a database?

评论 #9512914 未加载

评论 #9513854 未加载

评论 #9513706 未加载

chdirabout 10 years ago

> The main arguments against using PostgreSQL search are accuracy and speed at scale.Also, lack of faceted search, phrase search, partial word search (although you could use pg_trgm)Previous 2 related discussions: - <a href="https://news.ycombinator.com/item?id=8381748" rel="nofollow">https://news.ycombinator.com/item?id=8381748</a> - <a href="https://news.ycombinator.com/item?id=8714477" rel="nofollow">https://news.ycombinator.com/item?id=8714477</a>

danneuabout 10 years ago

I tried using Postgres 9.4 full-text search for my forum, but ran into some "column wider than full-text index" error (don't remember the exact issue). Wasn't running into any of these: <a href="http://www.postgresql.org/docs/8.3/static/textsearch-limitations.html" rel="nofollow">http://www.postgresql.org/docs/8.3/static/textsearch-limitat...</a>Googling the error lead me to a mailing list where someone was "working on it" or "going to revert the problem soon".Ended up using AWS CloudSearch which costs a minimum of $50/mo where I was already paying for Postgres.

krychuabout 10 years ago

> This query takes about 50ms!Would be useful to know how many rows you had in the table (?)

评论 #9514941 未加载

qxmatabout 10 years ago

50ms! Wow... take it from me, you should have used Solr.

curiouslyabout 10 years ago

PostgreSQL is coming up a lot and making waves, never thought much about it until recently it turns out to be more reliable than mongodb. Interesting phenomena.

CHY872about 10 years ago

Seems sketchy.While Postgres can do full text search and it may help you in your case, you can set up Elasticsearch for this sort of quantity of data in like an afternoon, if that.Also, these 'advanced features' people are throwing around are really trivial; they're the sort of things that would crop up in lecture 2 of an information retrieval course. Good search is really reliant on proper weighting etc (not like 'we weight the title more highly than the body' but more 'foogle is a far more important word than bar, if we have both in a query, we care about foogle more'). This generally requires a tonne of experimentation; information retrieval is not easy and it's very subtle.Postgres seems to try and make everyone design their own ranking function, which is a recipe for disaster. I haven't used Elasticsearch for a while, but I'd be surprised if they didn't have a bunch of relatively good presets.Postgres is good for loads of things, but Elasticsearch or Solr are state-of-the-art and Postgres isn't (so far as I know); if you aren't just hacking around then please consider the extra time to set up Elasticsearch and it'll make your life easier.

评论 #9518895 未加载