> A key gotcha that tripped us up: when querying with a list of primary keys, Postgres will not return the records in the same order as the IDs provided in a “where … in” clause.<p>That's frightening they don't know that. So, burn this into your minds: no ordering is guaranteed in SQL unless an ORDER BY is applied to it (and any ordering in a subquery is lost). Even if it seems to work, it will fail. No guarantees on order unless... scratch that onto your eyeballs so you never forget it.<p>Also, will people please stop posting rainbow-on-black screenshots, especially with the screenshotted text size smaller than the main text.
How can querying for PK's than doing a second query on the FK be faster than a join that's semantically equivalent? Postgres optimizer gone terribly wrong or something? Or am I misunderstanding what's going on?<p>It would never have occurred to me to even try this, I would have assumed postgres would do as well as could be done at what looks like a very standard join. So I guess that's a lesson not to make assumptions?<p>Not sure, without even having a theory of why pg would behave this way, I'd be worried that as the data changes it might regress again. Or if the phase of the moon changes or something -- since it seems like magic!
I wrote up my recent experience optimizing Postgres text search on a database with a few million records without relying on a new service like Elastic Search.
Has anyone overcome the 16382 positional limits of tsvector?<p>That and the automatic stemming and lemming of search words even in phrase searches makes postgres awful for any software where accurate search is critical.
PostgreSQL text search is awesome - for English and Roman type languages.<p>But Asian languages such as Thai, Japanese, Korean, etc are not going to work at all.<p>PostgreSQL is weird about joins. Joining on certain columns could be super fast but others dog slow. And this can flip depending on size of table and this index Cardinality.<p>That’s why it’s important on databases that grow quickly to check the performance of even simple queries as those can balloon in execution time as the profile of the data changes.
I'd love to see some details on the why using EXPLAIN ANALYZE on each query and schema. It seems like the changes were done with a hunch as to why they were slow?
PGSync might be useful for those who don't mind also running Elasticsearch<p><a href="https://github.com/toluaina/pgsync" rel="nofollow">https://github.com/toluaina/pgsync</a>
I remember achieving some quite good results when implementing a simple „fuzzyesque“ search for some B2B e-commerce system a few years back, but what hit us hard was the german dictionary used for stemming for the ts_vector index. The original one coming with psql did not support german compound words correctly and the „fix“ was to import a 3rd party one. I learned the hard ways that this comes at the cost of Postgres loading the dictionary every time a new connection uses the index. Running a simple, small php app that could not reuse/pool connections, every search query done was coming in on a clean connection and hit a >1s fee for having to load the dict.<p>iirc pgbouncer saved the day there.<p>Otherwise it worked fine.
You might want to test using the first query as a sub-query or cte in the second one. That would likely give you the same / better perf. It would avoid the join and save a round trip.