TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ten years of improvements in PostgreSQL's optimizer

299 pointsby samaysharmaabout 1 year ago

11 comments

darksaintsabout 1 year ago
As someone who has worked with Postgres for 15 years and also has spent most of my career modeling and solving mathematical optimization problems, I have so much to say on this topic, but I&#x27;ll leave it at these three points:<p>* Every optimization problem needs data on costs, and the more and better the data is, the better. Postgres has made some improvements here, especially with cross column statistics, but there are still massive improvements left on the table. The most glaring omission is data on syscall latencies. Reading a page from disk has dramatically different latencies from system to system, and postgres still relies on configuration values for these costs, when it could very easily be measuring them. Another omission is foreign key statistics. Joins along foreign keys should never have a bad plan, but they still occasionally do.<p>* Deferred or alternative scenario planning should be adopted, especially for large and expensive queries. As it is today, your plan is finalized before it is executed, even though earlier stages of execution could provide information (like rowcounts or cardinality estimates) that could dramatically improve later stage plans.<p>* Machine learning most definitely could be an area where improvements could be made, but I&#x27;ve been unimpressed with the efforts that Ive seen. Don&#x27;t use machine learning for planning, use machine learning for cost discovery and estimation. Build better cost models, and then let the optimization engine work with that data.
评论 #40066557 未加载
评论 #40070935 未加载
评论 #40068952 未加载
评论 #40067981 未加载
DoubleFreeabout 1 year ago
The postgres query optimizer will try to minimize the number of pages read from disk (and the number of intermediate pages written to disk). Benchmarking the query optimizer by making the shared buffers large enough to hold all the data therefore seems wrong, as you&#x27;re then measuring the speed of the query optimizer and the join processor, instead of the quality of the generated query plans. It would not surprise me if the generated plans for these versions are actually all the same and this is only measuring execution speed.
评论 #40066088 未加载
评论 #40066424 未加载
BitPirateabout 1 year ago
The author mentions PostgreSQL&#x27;s JIT compiler. Up to this day, I&#x27;ve only seen it degrade the performance of queries. Disabling it is on my install checklist.
评论 #40062262 未加载
评论 #40061567 未加载
评论 #40062319 未加载
评论 #40068008 未加载
uhoh-itsmaciekabout 1 year ago
Neat, but Postgres version numbering changed with v10. 9.6, 9.5, 9.4, 9.3, 9.2, 9.1, 9.0, 8.4, 8.3, 8.2, 8.1, and 8.0 are effectively all distinct major versions. It&#x27;d be interesting to see how perf changed with those.
评论 #40061549 未加载
评论 #40066317 未加载
uhoh-itsmaciekabout 1 year ago
&gt;Of course, not all of these improvements are attributable to the query optimizer.<p>It would be interesting to see plan changes, if any, across versions.
nsilvestriabout 1 year ago
Proebsting&#x27;s Law comes to mind: <a href="https:&#x2F;&#x2F;proebsting.cs.arizona.edu&#x2F;law.html" rel="nofollow">https:&#x2F;&#x2F;proebsting.cs.arizona.edu&#x2F;law.html</a>
评论 #40061896 未加载
评论 #40060887 未加载
评论 #40060585 未加载
评论 #40060832 未加载
评论 #40060583 未加载
mehulashahabout 1 year ago
I’m a bit confused with the analysis here. How do you confirm a downward trend in the data that is not visible on the graph. It looks like the median drops a little in the first few versions, but then rises in the last few. The R2 is really low, so I’m not sold on the correlation here. Basically, it looks like tail latency has improved and YMMV for everything else.
评论 #40066264 未加载
edweisabout 1 year ago
How query optimisation looks like? Does it optimize on the SQL or algorithm level?
评论 #40061274 未加载
评论 #40061118 未加载
评论 #40061495 未加载
评论 #40070583 未加载
moreporkabout 1 year ago
Site seems to be down, you can try this instead: <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20240417050840&#x2F;https:&#x2F;&#x2F;rmarcus.info&#x2F;blog&#x2F;2024&#x2F;04&#x2F;12&#x2F;pg-over-time.html" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20240417050840&#x2F;https:&#x2F;&#x2F;rmarcus.i...</a>
CodePhoenixabout 1 year ago
I&#x27;m not an expert in developing databases, but I understand that creating an excellent optimizer is quite challenging. Having followed your blog, I believe it will help me to enhance my knowledge in the area of databases.
throwaway984393about 1 year ago
It would arguably be more accurate to compare the actual systems running Postgres and their precompiled binaries. Nearly everything else in a given system running a given version of an application may have an impact on its performance in subtle ways. Its development occurred in conjunction with these other system components, thus it was optimized for them. In addition, distributions tend to ship their binaries with patches not found upstream which can also affect performance. Everything from the kernel and scheduler to glibc and more may affect performance. Newer isn&#x27;t always faster. Even the containerization you now use might add a subtle penalty that an older non-containered version may not experience.<p>Basically, it&#x27;s theoretically possible that some old ass system running version 8 runs it faster than your current system runs version 8 (excluding hardware).<p>Add to that all the other usual caveats about benchmarks etc (like the fact that IO wasn&#x27;t tested here but is pretty dang relevant to real world performance)