The prescription of changing sequential_page_cost to
equal random_page_cost is certainly reasonable for SSD, but I wonder if the underlying issues aren't somewhat deeper and more interesting. One difference between a sequential scan and an index scan is the amount of data being scanned. PostgreSQL stores information horizontally as rows and a sequential scan will have to read in all column values of all rows. An index scan will read through all values of a _single_ column. The 50x performance difference _might_ be just that the whole row is 50x wider than the width of the indexed join column.<p>An interesting second factor relates to the nature of the SSD storage. With SSDs a read request will pull back a 4K page, even if the read request was smaller. So it's not quite right to say that a sequential read and a random read cost the same on SSD, particularly if the same 4K page must be read multiple times. I suspect that the particular index technique used by PostgreSQL tends to organize data such that successive indexed values reside in the same 4K SSD page. IOW, it's not so much that the cost of random SSD access is the same as sequential SSD access (though that's true), as it is that the PostgreSQL index mechanism doesn't require multiple reads of the same 4K page.<p>if a Hash-based index was used instead of a Btree-based index, and if the table width was narrower, the sequential scan might have outperformed the index scan.
For those interested in postgresql.conf tuning, I did a recent presentation at two conferences recently, hope it's interesting. Slides: <a href="https://speakerdeck.com/ongres/postgresql-configuration-for-humans" rel="nofollow">https://speakerdeck.com/ongres/postgresql-configuration-for-...</a><p>[shamless plug]
Thanks for the tip. This probably affects more people than there are people who realize it. It would be really interesting if PG would use machine learning to discover this sort of tuning on its own.
I always increase the seq scan cost, but another thing that helped me more was updating the table statistics. In that way you can make the planner better aware of your indexes.<p>In my case I increased STATISTICS to 5000 and the planner immediately start using the index instead of full table scan.<p><a href="https://blog.pgaddict.com/posts/common-issues-with-planner-statistics" rel="nofollow">https://blog.pgaddict.com/posts/common-issues-with-planner-s...</a>
Unfortunately the author does not say some pretty basic things - which PostgreSQL version, how much data, how much of it fits into RAM, what storage (and hardware in general) ...<p>If I understand it correctly, PostgreSQL was using the default configuration. Which is rather inefficient, and is more about "must start everywhere".<p>Decreasing random_page_cost makes sense if you have storage that can handle random I/O well (although I wouldn't go to 1 even if it's an SSD). But who knows if the data was read from storage at all? Maybe it'd fit into RAM (and just increasing effective_cache_size would be enough for the planner to realize that).
Hmm... The author is filling a combo-box (probably just one column), yet the query is selecting all columns (SELECT *).<p>I would have tried selecting just the needed column (let's call it "foo"), with following indexes:<p>event_types (app, id)<p>prop_keys (event_id, foo)<p>This should cover the entire query with indexes (i.e. allow for index-only scan).
You also get good results tweaking this particular knob if you have large amounts of RAM. If your blocks are almost always in OS cache, you are almost never going to make random seeks even if the PostgreSQL planner thinks you are.