Postgres Full-Text Search: A search engine in a database

601 点作者 twakefield将近 4 年前

19 条评论

rattray将近 4 年前

Something that's missing from this which I'm curious about is how far can't postgres search take you?That is, what tends to be the "killer feature" that makes teams groan and set up Elasticsearch because you just can't do it in Postgres and your business needs it?Having dealt with ES, I'd really like to avoid the operational burden if possible, but I wouldn't want to choose an intermediary solution without being able to say, "keep in mind we'll need to budget a 3-mo transition to ES once we need X, Y, or Z".

评论 #27978505 未加载

评论 #27974139 未加载

评论 #27976354 未加载

评论 #27973991 未加载

评论 #27976907 未加载

评论 #27973939 未加载

评论 #27974334 未加载

评论 #27975197 未加载

评论 #27980372 未加载

评论 #27974185 未加载

评论 #27973845 未加载

评论 #27975624 未加载

评论 #27974674 未加载

评论 #27977114 未加载

评论 #27979888 未加载

评论 #27975892 未加载

评论 #27977526 未加载

评论 #27974029 未加载

评论 #27974679 未加载

tabbott将近 4 年前

Zulip's search is powered by this built-in Postgres full-text search feature, and it's been a fantastic experience. There's a few things I love about it:* One can cheaply compose full-text search with other search operators by just doing normal joins on database indexes, which means we can cheaply and performantly support tons of useful operators (<a href="https://zulip.com/help/search-for-messages" rel="nofollow">https://zulip.com/help/search-for-messages</a>).* We don't have to build a pipeline to synchronize data between the real database and the search database. Being a chat product, a lot of the things users search for are things that changed recently; so lag, races, and inconsistencies are important to avoid. With the Postgres full-text search, all one needs to do is commit database transactions as usual, and we know that all future searches will return correct results.* We don't have to operate, manage, and scale a separate service just to support search. And neither do the thousands of self-hosted Zulip installations.Responding to the "Scaling bottleneck" concerns in comments below, one can send search traffic (which is fundamentally read-only) to a replica, with much less complexity than a dedicated search service.Doing fancy scoring pipelines is a good reason to use a specialized search service over the Postgres feature.I should also mention that a weakness of Postgres full-text search is that it only supports doing stemming for one language. The excellent PGroonga extension (<a href="https://pgroonga.github.io/" rel="nofollow">https://pgroonga.github.io/</a>) supports search in all languages; it's a huge improvement especially for character-based languages like Japanese. We're planning to migrate Zulip to using it by default; right now it's available as an option.More details are available here: <a href="https://zulip.readthedocs.io/en/latest/subsystems/full-text-search.html" rel="nofollow">https://zulip.readthedocs.io/en/latest/subsystems/full-text-...</a>

评论 #27978277 未加载

评论 #27975871 未加载

lettergram将近 4 年前

I actually built a search engine back in 2018 using postgresql<a href="https://austingwalters.com/fast-full-text-search-in-postgresql/" rel="nofollow">https://austingwalters.com/fast-full-text-search-in-postgres...</a>Worked quite well and still use it daily. Basically doing weighted searches on vectors is slower than my approach, but definitely good enough.Currently, I can search around 50m HN & Reddit comments in 200ms on the postgresql running on my machine.

评论 #27974051 未加载

评论 #27975468 未加载

jcuenod将近 4 年前

Huh, just yesterday I blogged[0] about using FTS in SQLite[1] to search my PDF database. SQLite's full-text search is really excellent. The thing that tripped me up for a while was `GROUP BY` with the `snippet`/`highlight` function but that's the point of the blog post.[0] <a href="https://jcuenod.github.io/bibletech/2021/07/26/full-text-search-for-pdfs/" rel="nofollow">https://jcuenod.github.io/bibletech/2021/07/26/full-text-sea...</a>[1] <a href="https://www.sqlite.org/fts5.html" rel="nofollow">https://www.sqlite.org/fts5.html</a>

评论 #27981279 未加载

评论 #27980002 未加载

bityard将近 4 年前

I know Postgres and SQLite have mostly different purposes but FWIW, SQLite also has a surprisingly capable full-text search extension built right in: <a href="https://www.sqlite.org/fts5.html" rel="nofollow">https://www.sqlite.org/fts5.html</a>

评论 #27976403 未加载

theandrewbailey将近 4 年前

> You could also look into enabling extensions such as unaccent (remove diacritic signs from lexemes) or pg_trgm (for fuzzy search).Trigrams (pg_trgm) are practically needed for usable search when it comes to misspellings and compound words (e.g. a search for "down loads" won't return "downloads").I also recommend using websearch_to_tsquery instead of using the cryptic syntax of to_tsquery.

评论 #27975165 未加载

simonw将近 4 年前

The Django ORM includes support for PostgreSQL search and I've found it a really productive way to add search to a project: <a href="https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/search/" rel="nofollow">https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/s...</a>

SigmundA将近 4 年前

Keep wondering if RUM Indexes [1] will ever get merged for faster and better ranking (TF/IDF). Really would make PG a much more complete text search engine.<a href="https://github.com/postgrespro/rum" rel="nofollow">https://github.com/postgrespro/rum</a>

Grimm1将近 4 年前

Postgres Full-Text search is a great way to get search running for a lot of standard web applications. I recently used just this in Elixir to set up a simple search by keyword. My only complaint was Ecto (Elixir's query builder library) doesn't have first class support for it and neither does Postgrex the lower level connector they use. Still, using fragments with sanitized SQL wasn't too messy at all.

thom将近 4 年前

We get really nice results with gist indexes (gist_trgm_ops) searching across multiple entity types to do top X queries. It’s very useful to be able to make a stab at a difficult-to-spell foreign football player’s name, possibly with lots of diacritics, and get quick results back. I’m always surprised when I find a search engine on any site that is so unkind as to make you spell things exactly.

rattray将近 4 年前

TBH I hadn't known you could do weighted ranking with Postgres search before.Curious there's no mention of zombodb[0] though, which gives you the full power of elasticsearch from within postgres (with consistency no, less!). You have to be willing to tolerate slow writes, of course, so using postgres' built-in search functionality still makes sense for a lot of cases.[0] <a href="https://github.com/zombodb/zombodb" rel="nofollow">https://github.com/zombodb/zombodb</a>

评论 #27973820 未加载

syoc将近 4 年前

My worst search experiences always come from the features applauded here. Word stemming and removing stop words is a big hurdle when you know what you are looking for but get flooded by noise because some part of the search string was ignored. Another issue is having to type out a full word before you get a hit in dynamic search boxes (looking at you Confluence).

评论 #27975992 未加载

MushyRoom将近 4 年前

I was hyped when I found out about it a while ago. Then I wasn't anymore.When you have 12 locales (kr/ru/cn/jp/..) it's not that fun anymore. Especially on a one man project :)

评论 #27977665 未加载

评论 #27974114 未加载

kureikain将近 4 年前

I used Postgres full-text search for mail log feature on my email forward app <a href="https://hanami.run" rel="nofollow">https://hanami.run</a>Essentially allow arbitraty query in from/to/subject/body. One thing that make full-text serch work great for me is that I don't need to sort or rank the relevant of query. I just show a list of email that match the query order by their id.I also don't do pagination and counting, instead users has to load more paged and the ID of the email is pass to the query as a point to compare( where id < requests.get.before).And with those strategy, full text search works great for us since we don't really want to bring in ElasticSearch because only about 20% of users use this features.

pvsukale3将近 4 年前

If you are using Rails with Postgres you can use pg_search gem to build the named scopes to take advantage of full text search.<a href="https://github.com/Casecommons/pg_search" rel="nofollow">https://github.com/Casecommons/pg_search</a>

shakascchen将近 4 年前

No fun doing it for Chinese, especially for traditional Chinese.I had to install software but on Cloud SQL you can't. You have to do it on your instances.

评论 #27979723 未加载

eric4smith将近 4 年前

Postgres FTS is normally quite good.But it does not know how to deal with languages like Chinese, Japanese and Thai.For that you have to use something like PGroonga extension.The rest of PostgreSQL mostly handles things ok, unless you try to sort on one of these languages and the same things happen again.There are all ways around these problems. But it’s not as easy as turning on Unicode and just expect everything to work!Yes I’m native English speaker who started to develop in Asia and discovered all of this recently.

mrinterweb将近 4 年前

I've seen Elasticsearch set up for applications that would have equal benefit from just using the postgresql db's full-text search they already have access to.The additional complexity is usually incurred when the data in postgresql changes, and those changes need to be mirrored up to Elasticsearch. Elasticsearch obviously has its uses, but for some cases, postgresql's built in full-text search can make more sense.

nuker将近 4 年前

Is there alternative to ES that scales nicely? I'm running ELK stack for logging using AWS Elasticsearch. Logs have unpredictable traffic volume and even overprovisioned ES cluster gets clogged sometimes. I wonder is there something more scalable than ES, and have nice GUI like Kibana?

评论 #28114878 未加载

评论 #27981465 未加载

评论 #27982064 未加载

评论 #27980577 未加载

19 条评论

rattray将近 4 年前

评论 #27978505 未加载

评论 #27974139 未加载

评论 #27976354 未加载

评论 #27973991 未加载

评论 #27976907 未加载

评论 #27973939 未加载

评论 #27974334 未加载

评论 #27975197 未加载

评论 #27980372 未加载

评论 #27974185 未加载

评论 #27973845 未加载

评论 #27975624 未加载

评论 #27974674 未加载

评论 #27977114 未加载

评论 #27979888 未加载

评论 #27975892 未加载

评论 #27977526 未加载

评论 #27974029 未加载

评论 #27974679 未加载

tabbott将近 4 年前

评论 #27978277 未加载

评论 #27975871 未加载

lettergram将近 4 年前

评论 #27974051 未加载

评论 #27975468 未加载

jcuenod将近 4 年前

评论 #27981279 未加载

评论 #27980002 未加载

bityard将近 4 年前

评论 #27976403 未加载

theandrewbailey将近 4 年前

评论 #27975165 未加载

simonw将近 4 年前

SigmundA将近 4 年前

Grimm1将近 4 年前

thom将近 4 年前

rattray将近 4 年前

评论 #27973820 未加载

syoc将近 4 年前

评论 #27975992 未加载

MushyRoom将近 4 年前

I was hyped when I found out about it a while ago. Then I wasn't anymore.When you have 12 locales (kr/ru/cn/jp/..) it's not that fun anymore. Especially on a one man project :)

评论 #27977665 未加载

评论 #27974114 未加载

kureikain将近 4 年前

pvsukale3将近 4 年前

shakascchen将近 4 年前

No fun doing it for Chinese, especially for traditional Chinese.I had to install software but on Cloud SQL you can't. You have to do it on your instances.