TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How a single PostgreSQL config change improved slow query performance by 50x

218 pointsby ryanashcraftover 7 years ago

9 comments

quizoticover 7 years ago
The prescription of changing sequential_page_cost to equal random_page_cost is certainly reasonable for SSD, but I wonder if the underlying issues aren&#x27;t somewhat deeper and more interesting. One difference between a sequential scan and an index scan is the amount of data being scanned. PostgreSQL stores information horizontally as rows and a sequential scan will have to read in all column values of all rows. An index scan will read through all values of a _single_ column. The 50x performance difference _might_ be just that the whole row is 50x wider than the width of the indexed join column.<p>An interesting second factor relates to the nature of the SSD storage. With SSDs a read request will pull back a 4K page, even if the read request was smaller. So it&#x27;s not quite right to say that a sequential read and a random read cost the same on SSD, particularly if the same 4K page must be read multiple times. I suspect that the particular index technique used by PostgreSQL tends to organize data such that successive indexed values reside in the same 4K SSD page. IOW, it&#x27;s not so much that the cost of random SSD access is the same as sequential SSD access (though that&#x27;s true), as it is that the PostgreSQL index mechanism doesn&#x27;t require multiple reads of the same 4K page.<p>if a Hash-based index was used instead of a Btree-based index, and if the table width was narrower, the sequential scan might have outperformed the index scan.
评论 #15764977 未加载
评论 #15765610 未加载
评论 #15764365 未加载
ahacheteover 7 years ago
For those interested in postgresql.conf tuning, I did a recent presentation at two conferences recently, hope it&#x27;s interesting. Slides: <a href="https:&#x2F;&#x2F;speakerdeck.com&#x2F;ongres&#x2F;postgresql-configuration-for-humans" rel="nofollow">https:&#x2F;&#x2F;speakerdeck.com&#x2F;ongres&#x2F;postgresql-configuration-for-...</a><p>[shamless plug]
评论 #15764667 未加载
评论 #15764162 未加载
danjocover 7 years ago
Thanks for the tip. This probably affects more people than there are people who realize it. It would be really interesting if PG would use machine learning to discover this sort of tuning on its own.
评论 #15764191 未加载
评论 #15765747 未加载
评论 #15762921 未加载
olavggover 7 years ago
I always increase the seq scan cost, but another thing that helped me more was updating the table statistics. In that way you can make the planner better aware of your indexes.<p>In my case I increased STATISTICS to 5000 and the planner immediately start using the index instead of full table scan.<p><a href="https:&#x2F;&#x2F;blog.pgaddict.com&#x2F;posts&#x2F;common-issues-with-planner-statistics" rel="nofollow">https:&#x2F;&#x2F;blog.pgaddict.com&#x2F;posts&#x2F;common-issues-with-planner-s...</a>
评论 #15764961 未加载
评论 #15764580 未加载
pgaddictover 7 years ago
Unfortunately the author does not say some pretty basic things - which PostgreSQL version, how much data, how much of it fits into RAM, what storage (and hardware in general) ...<p>If I understand it correctly, PostgreSQL was using the default configuration. Which is rather inefficient, and is more about &quot;must start everywhere&quot;.<p>Decreasing random_page_cost makes sense if you have storage that can handle random I&#x2F;O well (although I wouldn&#x27;t go to 1 even if it&#x27;s an SSD). But who knows if the data was read from storage at all? Maybe it&#x27;d fit into RAM (and just increasing effective_cache_size would be enough for the planner to realize that).
评论 #15765290 未加载
branko_dover 7 years ago
Hmm... The author is filling a combo-box (probably just one column), yet the query is selecting all columns (SELECT *).<p>I would have tried selecting just the needed column (let&#x27;s call it &quot;foo&quot;), with following indexes:<p>event_types (app, id)<p>prop_keys (event_id, foo)<p>This should cover the entire query with indexes (i.e. allow for index-only scan).
stubishover 7 years ago
You also get good results tweaking this particular knob if you have large amounts of RAM. If your blocks are almost always in OS cache, you are almost never going to make random seeks even if the PostgreSQL planner thinks you are.
评论 #15764264 未加载
topbananaover 7 years ago
This seems like something that should be measured at startup?
评论 #15806759 未加载
emilfihlmanover 7 years ago
Why aren&#x27;t these tuned automatically? Should be pretty easy.
评论 #15764954 未加载