The part of Postgres we hate the most: Multi-version concurrency control

257 点作者 andrenotgiant大约 2 年前

22 条评论

mmaunder大约 2 年前

I must admit as a web practitioner since 1994 I have a bit of an issue with this:> In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons!Different DB's, different strengths and it's not a zero sum came as implied. MySQL was popular before Google was born - we used it heavily at eToys in the 90s for massive transaction volume and replacing it with Oracle was one of the reasons for the catastrophic failure of eToys circa 2001. MongoDB gained traction not because it's an alternative to MySQL or PostgreSQL. And PostgreSQL's marketshare today is on a par with Mongo and both are dwarfed by MySQL which IMO is the true darling of web DB's given it's global popularity.

评论 #35720854 未加载

评论 #35721593 未加载

评论 #35721399 未加载

评论 #35719169 未加载

评论 #35731932 未加载

评论 #35722762 未加载

评论 #35722020 未加载

评论 #35723786 未加载

评论 #35721446 未加载

评论 #35730367 未加载

评论 #35721330 未加载

fdr大约 2 年前

One of the weird things about Postgres MVCC is that it is "optimized for rollback," as one person memorably quipped to me. This is not to imply a design principle, it's more a description of how things ended up, and the general argument behind this quip is Postgres lacks "UNDO" segments.On the one hand, this does make the model Postgres uses admirably simple: the WAL is all "REDO," and the heap is all you need to accomplish any kind of read, but at the expense that stuff that normally would be copied off to a sequential UNDO log and then vaporized when the transaction commits and all possible readers have exited remains comingled with everything else in the main database heap, needing to be fished out again by VACUUM for purging and figuring out how to reclaim numerical space for more transactions.There may be other solutions to this, but it's one unusual quality Postgres has relative to other MVCC databases, many of which sport an UNDO log.There are downsides to UNDO, however: if a read needs an old copy of the tuple, it needs to fish around in UNDO, all the indices and synchronization need to account for this, and if there's a rollback or crash recovery event (i.e. mass-rollback of all transactions open at the time), everything has to be shuffled back into the main database storage. Hence the memorable initial comment: "Postgres is optimized for rollback."

评论 #35721603 未加载

cyberax大约 2 年前

Yup. A lot of heavy users of Postgres eventually hit the same barrier. Here's another take from Uber: <a href="https://www.uber.com/blog/postgres-to-mysql-migration/" rel="nofollow">https://www.uber.com/blog/postgres-to-mysql-migration/</a>I had a similar personal experience. In my previous job we used Postgres to implement a task queuing system, and it created a major bottleneck, resulting in tons of concurrency failures and bloat.And most dangerously, the system failed catastrophically under load. As the load increased, most transactions ended up in concurrent failures, so very little actual work got committed. This increased the amount of outstanding tasks, resulting in even higher rate of concurrent failures.And this can happen suddenly, one moment the system behaves well, with tasks being processed at a good rate, and the next moment the queue blows up and nothing works.I re-implemented this system using pessimistic locking, and it turned out to work much better. Even under very high load, the system could still make forward progress.The downside was having to make sure that no deadlocks can happen.

评论 #35721940 未加载

评论 #35721568 未加载

elijaht大约 2 年前

> Another problem with the autovacuum in PostgreSQL is that it may get blocked by long-running transactions, which can result in the accumulation of more dead tuples and stale statistics. Failing to clean expired versions in a timely manner leads to numerous performance problems, causing more long-running transactions that block the autovacuum process. It becomes a vicious cycle, requiring humans to intervene manually by killing long-running transactions.Oh man, a previous company I worked at had an issue with a hot table (frequent reads + writes) interfering with autovacuum. Many fires over a six month period arose from all of that. I was (luckily) only on an adjacent team, so I don't know the details, other than vacuums taking over 24 hours! I'm sure it could have been prevented, but it seemed horrible to debug

评论 #35720645 未加载

dale_glass大约 2 年前

That's interesting, MVCC was the thing that drew me to Postgres to begin with!Way back I was working on an in-house inventory app written in Visual Basic against SQL Server 2000, I think. That one just put locks on tables. It had the "charming" characteristic of that if you weren't very, very careful with Enterprise Manager, loading a table in the GUI put a lock on it and just keep on holding it until that window was closed.Then the running app would eventually snag on that lock, maybe keep holding some other lock that something else would snag on, and 5 minutes later I'd hear one of the operators screaming "Nothing is working! I can't take any orders!" from the room next to me.

评论 #35719580 未加载

评论 #35722059 未加载

评论 #35723607 未加载

garyclarke27大约 2 年前

Clever Clickbait - Of course at the end of the article they offer a solution - their product (and of course it’s AI enhanced) to the problem they have overhyped.

评论 #35719273 未加载

评论 #35720967 未加载

评论 #35718386 未加载

评论 #35720197 未加载

评论 #35718800 未加载

spprashant大约 2 年前

This post has a valid point. But the last line makes it clear why they care so much about it.Yeah, table bloat and transaction ID wraparounds are terrible, but easily avoidable if you follow a few simple guidelines. Typically in my experience, best way to avoid these issues are to set sensible vacuum settings and track long running queries.I do hate the some of the defaults in the Postgres configuration are too conservative for most workloads.

评论 #35720793 未加载

评论 #35718567 未加载

评论 #35720107 未加载

评论 #35720733 未加载

Max-Ganz-II大约 2 年前

MVCC for Amazon Redshift;(pdf) <a href="https://www.redshiftresearchproject.org/white_papers/downloads/multi_version_concurrency_control_and_serialization_isolation_failure.pdf" rel="nofollow">https://www.redshiftresearchproject.org/white_papers/downloa...</a>(html) <a href="https://www.redshiftresearchproject.org/white_papers/downloads/multi_version_concurrency_control_and_serialization_isolation_failure.html" rel="nofollow">https://www.redshiftresearchproject.org/white_papers/downloa...</a>I've been told, very kindly, by a couple of people that it's the best explanation they've ever seen. I'd like to get more eyes on it, to pick up any mistakes, and it might be useful in and of itself anyway to reader, as MVCC on Redshift is I believe the same as MVCC was on Postgres before snapshot isolation.

评论 #35719074 未加载

评论 #35718538 未加载

audioheavy大约 2 年前

My main takeaway from this article: as popular as Postgres and MySQL are, and understanding the legacy systems built for them, it will always require deep expertise and "black magic" to achieve enough performance and scale for hyper scale use cases. It justifies the (current) trend to have DB's built for distributed tx/writes/reads that you don't have to become a surgeon to scale. There are other DBs and DBaaS that, although not OSS, have solved this problem in a more cost-efficient way than having a team of surgeons.

评论 #35719744 未加载

zinclozenge大约 2 年前

I guess the question is, which MVCC strategy would be the "right" one to pick for a modern relational database? The paper linked focuses on main memory databases, and being main memory allows you to do things you can't do when disk based.

评论 #35723502 未加载

jasonhansel大约 2 年前

IMHO part of the issue is that Postgres was built on the assumption that snapshot isolation would be widely used. I don't think this has proven to be the case.Snapshot isolation isn't as robust and straightforward as strict serializability, but it also isn't as performant as READ COMMITTED. It seems like the worst of both worlds.

评论 #35725560 未加载

评论 #35733359 未加载

runlaszlorun大约 2 年前

As an aside, Andy Pavlo (one the authors here) has his CMU database course videos up on YouTube and they are tremendous. I’ve spent 2 decades developing web applications but am not exaggerating when I say that I’m 10x more knowledgable on databases having watched his courses during Covid.

评论 #35723924 未加载

avinassh大约 2 年前

This was a fun read. But now I have a couple of questions1. Since MySQL keeps delta to save storage costs, wouldn't read and writes slower because now I have to build the full version from the delta2. On secondary indexes, they highlight the reads will be slower and also say:> Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.What are the other advantages they have to make reads faster?Compared to MySQL, I remember reading that Postgres MVCC lets you alter the table without locking. Now I found out that MySQL also does not require locks. So, how are they doing?Are there any similar posts which explain MySQL MVCC architecture?

评论 #35724953 未加载

h1fra大约 2 年前

Once you figured out all the point in this article, it's a matter of fine tuning, can take some times but eventually it will works. The only thing I still struggle with is the Table Bloat.On managed Postgres (i.e: gcp, aws) you pay for the disk, but when you can't run a VACUUM FULL because it locks the table, you end up with a lot of allocated storage for nothing and you can't shrink the disk size (at least on gcp). Storage is cheap but still feels like a waste.

评论 #35719721 未加载

Ym9oYW4大约 2 年前

So why Postgres chooses the worst MVCC design compared to MySQL and Oracle? Is this because of legacy reasons or other factors?

评论 #35717429 未加载

no_wizard大约 2 年前

Can the MVCC implementation be swapped via Postgres extensions?

评论 #35718071 未加载

评论 #35718044 未加载

评论 #35718302 未加载

vp8989大约 2 年前

Am I correct in thinking that PG's MVCC implementation results in a worse story around offloading some mild OLAP workloads to a replica without affecting the primary? Anecdotally, it seems that MySQL handles this better but I don't understand the internals of both enough to explain why that is.<a href="https://aws.amazon.com/blogs/database/manage-long-running-read-queries-on-amazon-aurora-postgresql-compatible-edition/" rel="nofollow">https://aws.amazon.com/blogs/database/manage-long-running-re...</a>

umairshahid大约 2 年前

Here is a rebuttal of many of the points raised by Uber against PostgreSQL: <a href="https://www.2ndquadrant.com/en/blog/thoughts-on-ubers-list-of-postgres-limitations/" rel="nofollow">https://www.2ndquadrant.com/en/blog/thoughts-on-ubers-list-o...</a>

zeroimpl大约 2 年前

> Oracle and MySQL do not have this problem in their MVCC implementation because their secondary indexes do not store the physical addresses of new versions. Instead, they store a logical identifier (e.g., tuple id, primary key) that the DBMS then uses to look up the current version’s physical address.This doesn’t have anything to do with MVCC. I’m sure PostgreSQL could implement an index format that piggybacks on another index rather than pointing at the physical page directly, without overhauling MVCC.

评论 #35747341 未加载

Temporary_31337大约 2 年前

I work for a db vendor and Acid compliance (also implemented with MVCC) is a big selling point. Yet, most use cases I later see don’t require such rigid controls on updates. This means customers are paying for this as transactionally consistent updates are more expensive than eventually consistent ones.

kerblang大约 2 年前

Question: Why would I need more than one extra version of the same row? I would think that with transactional locking everybody else is waiting on the first update to commit before getting their own changes in, unless the db is somehow trying to lock columns-per-row instead of entire rows.

评论 #35722015 未加载

评论 #35721944 未加载

jonsnow_dba大约 2 年前

We have MySQL/MariaDB in RDS and ever since we migrated MariaDB to 10.6.12 we get at least 1 table corruption per day. Only work around available is to restart database just like windows-95.