If Eventual Consistency Seems Hard, Wait Till You Try MVCC

121 pointsby boynamedsueover 10 years ago

18 comments

AlisdairOover 10 years ago

I certainly wouldn't dispute that it can be hard to reason about concurrent operations in traditional database systems - particularly if you want to do it across multiple database systems, which do indeed have differing implementations. That said, if the author thinks eventual consistency is easy/not-that-bad/whatever by comparison, I would tend to question their understanding of programming for eventually consistency. Eventual consistency for non-trivial applications is really, really, really hard to get right.Consistency in relational databases is a hard topic because maintaining consistency in any concurrent system is hard. Traditional systems provide you with a set of tools (isolation levels, foreign keys) that make a decent programmer capable of building a safe concurrent application. Throwing away those tools and replacing them with nothing does not make life easier. The tool is easier to understand, but the problem is harder to solve.

评论 #8722536 未加载

jeffdavisover 10 years ago

PostgreSQL supports true serializability, while maintaining good performance and concurrency:<a href="http://www.postgresql.org/docs/9.4/static/transaction-iso.html#XACT-SERIALIZABLE" rel="nofollow">http://www.postgresql.org/docs/9.4/static/transaction-iso.ht...</a>It's based on fairly recent research by Michael J. Cahill, et al.It's simple. If you are confused, then set default_transaction_isolation=true. You will get errors if there's a data race.Given that, what's the point of the article? That sub-SERIALIZABLE modes have complex semantics? Yes, that's true, but they are still much more likely to help you then the NoSQL "you're on your own to avoid races" approach.If you want to avoid lots of really challenging problems, PostgreSQL is often the best bet by far.

评论 #8721763 未加载

Dave_Rosenthalover 10 years ago

I agree with the author that the various levels of isolation, etc. within the current crop of SQL databases is a morass. I’ll point to some recent fine work by Martin Kleppmann (<a href="https://github.com/ept/hermitage" rel="nofollow">https://github.com/ept/hermitage</a>) that explores the issue and shows how many systems fall short of serializable isolation in their default modes. (And sometimes in modes labeled “serializable”!) In his test three databases actually achieve full serializability: PostgreSQL, MS SQL Server, and FoundationDB.But don’t give up on ACID yet! If can actually get real serializability, you have a very powerful model that is also very simple to reason about. Serializable isolation gives you a virtual “whole database lock” that lets you modify lots of things all over the database before you “unlock” it and hand the database to the next client. The amazing thing about MVCC combined with serializable isolation is that you get to use this "simplest possible" mental model even though hundreds or thousands of concurrent clients might be all hitting the database at the same time.

评论 #8721058 未加载

quizoticover 10 years ago

One of the functions of an Operating System is to provide (the illusion of) isolation between Processes and the resources they acquire. Roughly,Transaction : Database :: Process : OperatingSystemThe four transaction isolation levels, defined nearly fifty years ago, were an attempt at categorizing incomplete isolation. Can you imagine launching an operating system process and saying "It's OK if my memory locations are changed by another process" Or "Don't let my memory locations be changed by another process, but it's OK if another process prevents me from deleting a resource". That's what we're asking of our non-serializable database transactions.Why not just provide strong isolation? Partly because it is hard and partly because the performance impact of strong transaction isolation is greater than the performance impact of process context switching.But if you're living with less than strong transaction isolation, then tautologically, strange and unexpected things eventually happen: seeing state that never existed, seeing state changes that you didn't make, failing to see state that you should have seen. Rarely, the application can reliably detect and handle some of these situations. Typically, the application only thinks it can.Sometimes, I think the core issue is with the notions of isolation and serializability themselves. Developers want to believe that events are noticed simultaneously with their occurrence, and that all observers (transactions) see the same history unfold in the same order. But the pesky physical world doesn't work that way.

nickikover 10 years ago

While reading this I was thinking about Datomic. I know that transactions are serialisable and actually serialised and stored in the database. Essentially what you get is a full DB lock and you can access the full database, or even do whatever you want. That of course does lock the database transacter, reading can still go on consistently.Was is not also the research in VoltDB, that coordination is the problem, and that you have to do all transaction on a single core. Am I remembering this correctly?Seams to me Datomic hits a very nice spot very you have relativly fast writing and concptionally unlimited reads.If somebody know more then me, I happy to learn.PS:Also, why are we still using NoSQL and make any generalisation about it, by know there are so many NoSQL databases that they have literally nothing in common exept that its not a traditionall SQL database.

评论 #8723166 未加载

评论 #8721667 未加载

评论 #8733856 未加载

评论 #8722820 未加载

spacemanmattover 10 years ago

It would seem a much more honest argument to me if MVCC were being compared to another implementation of consistency management, rather than compared to NO IMPLEMENTATION at all.

评论 #8722949 未加载

epeover 10 years ago

I find this: <a href="http://martin.kleppmann.com/2014/11/25/hermitage-testing-the-i-in-acid.html" rel="nofollow">http://martin.kleppmann.com/2014/11/25/hermitage-testing-the...</a> a more informative read on essentially the same topic.

评论 #8721318 未加载

friendzisover 10 years ago

> “what’s the difference between Consistency and Isolation again?”This is what bothers me most about this article. How can a person seemingly having advanced knowledge of relational databases cannot understand that Consistency is about dataset state and Isolation is about transactions. It's not that you get consistent results between transactions or even queries, but that if key is integral number, then `itoa()` works (given that bit length not too large) no matter what, period.If you want to support concurrent read and write operations on a database, then it's not much different from multithreaded programming - hard and impossible to get right without compromises.

评论 #8722782 未加载

taericover 10 years ago

I'm not entirely sure I understand how things are any better in either world. Especially if you are approaching the problem with the idea that you will have no failure conditions, things are going to be tough. Prohibitively so.

Terr_over 10 years ago

I think part of the goal with Eventual Consistency is that it lets you attack the underlying issues in a qualitatively different way.Relying on the DB, your options are constrained by your choice of relational tables and rows, and further affected by your DB vendor and configuration. When you lift the conflicts out to a higher layer of abstraction, you can do something more object-oriented, leveraging some of the same tools you'll need anyway to deal with other inconsistencies and quirks.

mwcampbellover 10 years ago

> Sorry, I'm not impressed with serializable isolation via a single writer mutex.I think that for a large number of applications, this would work just fine. Especially if the underlying storage is SSD-based -- something that's now easily obtained via VPS providers like DigitalOcean, Linode, and Vultr. After all, most database-driven applications aren't large-scale operations. So it's probably better to favor correctness and simplicity over concurrency and scalability.

grogersover 10 years ago

I'm probably the minority on this, but I think innodb's docs on its isolation modes are clearer and easier to understand than postgres'. That could be because I have internalized them over many years though.To me, the only sensible isolation level for transactions is serializable. We've had so many consistency issues because developers didn't understand when to lock and when not to. Everyone (including me) gets it wrong from time to time.Very careful use of nonlocking selects can be correct and improve concurrency, but across a huge codebase, bugs slip in. If I could turn on a mode where the default is "select ... lock in share mode" but opt into a "select ... nonlocking" I would be eternally grateful. Judicious use of wrappers where we force the user to specify the lock mode has helped curb this trend quite a bit.I'd still take MVCC with tons of warts over eventual consistency any day of the week. It is impossible to reason about eventual consistency because the database is literally allowed to do basically anything. Causal should really be the default starting point for AP systems.

评论 #8721305 未加载

rectangover 10 years ago

While they aren't distributed, Apache Lucene and Apache Lucy (the "loose C" port of Lucene that I work on) both implement MVCC: each index reader object represents a point-in-time view of the index that persists for the object's lifetime.Core developers such as myself have to deal with some subtleties when implementing MVCC, but I wouldn't say that MVCC is too terribly hard for our users. The thing is, our interface does not look anything like SQL -- you have to use a different mental model when interacting with our data stores, and of course they are not suitable for all applications where you might use SQL.What I took away from the article is that MVCC does not fit well within SQL semantics.

grandalfover 10 years ago

This is a great thing to read and discuss, as opposed to blindly jumping on one bandwagon or another.A relational database is a framework for storing data that comes with some rules in in exchange helps to guarantee certain characteristics. A flat file is too. The choice of which one to use depends on the problem.Often there are several good choices. Imagine all the relational db zealots who frown upon using a relational db in a slightly unconventional way, such as strong events in a single table using a json column in postgres. Yet postgres provides some useful features and so it may actually be very smart to use it that way.

klochnerover 10 years ago

<pre><code> > Does the first part of this excerpt contradict the second part? (Emphasis mine) </code></pre> No, it's perfectly clear and consistent. Does anyone else find it confusing?

debacleover 10 years ago

I wish people would not blog as experts about things they don't know a great deal about.

empthoughtover 10 years ago

> Many developers, including myself, have written applications that fall afoul of the MVCC implementation and rules.Color me shocked.

imanaccount247over 10 years ago

He's trying to claim that MVCC pushes complexity on to the user, but the examples of that are just typical "mysql does it wrong". So, that means mysql pushes complexity on to the user, not MVCC.

评论 #8722104 未加载