TechEcho

11 comments

malisperover 4 years ago

For those unaware, Zheap is a new storage engine for Postgres that handles updates and deletes in a different way. Currently, when you update a row in Postgres, Postgres creates a new copy of the row and marks the old one as deleted. This is done for several reasons, specifically it makes handling concurrency easier and it makes rolling back transactions easier.The issue with this approach is over time this leads to lots of "dead rows" - deleted rows that are taking up space in your DB. Postgres has a background job called the "vacuum" which effectively garbage collects all deleted rows. Depending on your settings, the vacuum can be a pretty expensive job and even with the vacuum, you can still wind up using a lot more space than you actually need.Zheap addresses these problems by using a different approach. When performing an update, instead of marking the old row as deleted and inserting a new one, Postgres will replace the old row with the new one and write a copy of the old row to a separate file. This means the main table file doesn't need to be larger than necessary to keep track of the dead rows.Zheap does lead to lots of tricky scenarios. If for any reason you need to access the old copy of the row, you have to fetch it from the separate file. If the transaction that performed that update is rolled back, you need to replace the new version of the row with the old version of the row. This sounds straightforward, but gets really tricky really fast. For example, what happens if I have row X that takes up 1kb. I replace it with row Y that takes up 500b. I then write row Z after row Y that takes up 500b. If you want to rollback the original transaction, row X will no longer fit in its original spot because row Z is now taking up part of the space it used to occupy.

评论 #24759982 未加载

评论 #24759753 未加载

评论 #24759814 未加载

评论 #24763780 未加载

评论 #24760395 未加载

评论 #24760723 未加载

评论 #24760362 未加载

评论 #24759759 未加载

评论 #24759783 未加载

评论 #24759760 未加载

评论 #24763615 未加载

jacques_chesterover 4 years ago

Confusingly, there is another new PostgreSQL storage effort called "Zedstore": <a href="https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5GN8u396o5sA2AF5N97vTRAEDYac7w%40mail.gmail.com" rel="nofollow">https://www.postgresql.org/message-id/CALfoeiuF-m5jg51mJUPm5...</a>Disclosure: I work for VMware, who are sponsoring zedstore development via Greenplum.

评论 #24761943 未加载

fanf2over 4 years ago

There is a super informative blog post from last week at <a href="https://www.cybertec-postgresql.com/en/zheap-reinvented-postgresql-storage/" rel="nofollow">https://www.cybertec-postgresql.com/en/zheap-reinvented-post...</a> - discussed at <a href="https://news.ycombinator.com/item?id=24710765" rel="nofollow">https://news.ycombinator.com/item?id=24710765</a>

throwdbaawayover 4 years ago

> What happens if the new row does not fit in? In this case performance will be worse because zheap essentially has to perform a DELETE / INSERT operation which is of course not as efficient as an in-place UPDATE.I would be wary of this. Innodb, for example, also has an optimistic (in-place) UPDATE mode, and a pessimistic UPDATE mode.Repeatedly updating the same row under the pessimistic mode would end up stalling the database, even at a rather low QPS.<a href="https://bugs.mysql.com/bug.php?id=53825" rel="nofollow">https://bugs.mysql.com/bug.php?id=53825</a> was originally reported by Facebook 10 years ago, and is still not fixed in 8.0.21 with VATS / CATS scheduling.And then there is also the performance gap between the ideal case where undo log is still in memory, versus the pathological case where undo log needs to be fetched from disk.The last thing postgres needs is something that looks good on paper / in benchmark, but has a bunch of issues in production.

sega_saiover 4 years ago

It is nice that PG got another storage engine. I hope that will ease the implementation of the column based storage in official PG, as the row based storage is a major problem with very large datasets.

ksecover 4 years ago

When can we expect Zheap to land? 14?With the new pluggable Storage API, Are there any other new storage engine other than Zheap?

macdiceover 4 years ago

Great to see Antonin and others working on this! I did some work on some pieces of this (along with many others), and I gave a talk about an earlier prototype of the undo log machinery and how it fits at PGCon 2019 (which now seems like a whole lifetime ago!). Slides and recording here in case the background is interesting:<a href="https://speakerdeck.com/macdice/transactions-in-postgresql-and-other-animals" rel="nofollow">https://speakerdeck.com/macdice/transactions-in-postgresql-a...</a>

teejover 4 years ago

This seems neat!Last year I helped a friend diagnose an issue they had with Postgres. A database table with a scheduled full DELETE/INSERT had slowed to the point of failure. It turns out, having slightly less IO than needed led the auto-VACUUM process to get further and further behind each time it ran.My friend simply provisioned more IO and moved on. Another option would be to rewrite the process to naturally produce fewer dead rows. It would be great to have a third feasible option.

评论 #24760166 未加载

评论 #24759928 未加载

yayzheapover 4 years ago

This should solve one of the problems Uber had with PostgreSQL reported by Christophe Pettus back in 2017[1].1- <a href="https://thebuild.com/presentations/uber-perconalive-2017.pdf" rel="nofollow">https://thebuild.com/presentations/uber-perconalive-2017.pdf</a>Previous discussion from 2018 <a href="https://news.ycombinator.com/item?id=16526623" rel="nofollow">https://news.ycombinator.com/item?id=16526623</a>

评论 #24761029 未加载

AdamProutover 4 years ago

I wonder how this approach will compare with an LSM tree style index that has gained popularity recently (mostly via RocksDB). LSM trees are also write optimized and allow for some optimizations by batching up updates and merging them in bulk.

ronloboover 4 years ago

Look no further, after playing around with TimescaleDB, there are no updates anymore and we can put this problem into a vacuum :)

11 comments

malisperover 4 years ago

评论 #24759982 未加载

评论 #24759753 未加载

评论 #24759814 未加载

评论 #24763780 未加载

评论 #24760395 未加载

评论 #24760723 未加载

评论 #24760362 未加载

评论 #24759759 未加载

评论 #24759783 未加载

评论 #24759760 未加载

评论 #24763615 未加载

jacques_chesterover 4 years ago

评论 #24761943 未加载

fanf2over 4 years ago

throwdbaawayover 4 years ago

sega_saiover 4 years ago

ksecover 4 years ago

When can we expect Zheap to land? 14?With the new pluggable Storage API, Are there any other new storage engine other than Zheap?

macdiceover 4 years ago

teejover 4 years ago

评论 #24760166 未加载

评论 #24759928 未加载

yayzheapover 4 years ago

评论 #24761029 未加载

AdamProutover 4 years ago

ronloboover 4 years ago

Look no further, after playing around with TimescaleDB, there are no updates anymore and we can put this problem into a vacuum :)

Zheap – Reinvented PostgreSQL Storage

11 comments

Zheap – Reinvented PostgreSQL Storage

11 comments