I have been advocating for the longest time that "repeatable read" is just a bad idea. Even if implementations were perfect. Even when it works correctly in the Database, it is still very tricky to reason about when dealing with complex queries.<p>I think two isolation levels that make sense are either:<p>* read committed<p>* serializable<p>You either go all the way to have a serializable setup, where there are no surprises. OR, you go in read committed direction where it is obvious that if you want have a consistent view of the data within a transaction, you have to lock the rows before you start reading them.<p>Read committed is very similar to just regular multi-threaded code and its memory management, so most engineers can get a decent intuitive sense for it.<p>Serializable is so strict that it is pretty hard to make very unexpected mistakes.<p>Anything in-between is a no man's land. And anything less consistent than Read Committed is no longer really a database.
How does append (a) map onto actual SQL operations on the given tables? Are the TEXT fields being used as lists?<p>Also… I’ve been issues in MySQL repeatable read mode where a single SELECT, selecting a single row, returned impossible results. I think it was:<p><pre><code> SELECT min(value), max(value) FROM table WHERE id = 1;
</code></pre>
where id is a primary key. I got two <i>different</i> values for min and max. That was a fun one.
I appreciate the write-up and the nod to AWS RDS. However, I was wondering if there was any focus on AWS Aurora (MySQL)? For those that don't know, AWS build a protocol compatible database platform that pretends to be MySQL or PostgreSQL. It would be interesting to see if Aurora MySQL has the same "features" as RDS or even MariaDB.
Facinating read. I think it is a great illustration to show how many "practically working systems" can be built on the foundation exhibiting so many consistency artifacts
> In 2022 Jepsen commissioned the University of Porto’s INESC TEC to develop LazyFS: a FUSE filesystem for simulating the loss of un-fsynced writes<p>I love this; what a great example of pushing the state of the art forward. Kudos!
how about that, I planned to do some work today.<p>aphyr, thank you. call me maybe and later jepsen.io have been consistently some of the best content I've ever read on the internet.
"SELECT ... FOR UPDATE" seems to be the answer to all these issues right? Lock the rows you're going to be updating and suddenly everything works as advertised.
In my experience, most developers don't even consider isolation level in the first place and just take whatever the default is. Any race conditions are met with an 'oh that's weird', and then they move on.
How much of what is contained within this analysis of MySQL is going to be the same-same for MariaDB, given that it uses InnoDB as the default storage engine?
I understand why the default transaction isolation level of most DBMS is weaker than serializable (it's for benchmark purposes), but I'd argue the best default is serializable. Most DBMS users don't even know there are many consistency models [1]. They expect transactions to "just work," i.e. to appear to have occurred in some total order, which is the definition of serializability [2]. And to some who know when to use a weaker isolation level for better performance can always set it per transaction [3].<p>---<p>[1] <a href="https://jepsen.io/consistency" rel="nofollow noreferrer">https://jepsen.io/consistency</a><p>[2] <a href="https://jepsen.io/consistency/models/serializable" rel="nofollow noreferrer">https://jepsen.io/consistency/models/serializable</a><p>[3] <a href="https://www.postgresql.org/docs/16/sql-set-transaction.html" rel="nofollow noreferrer">https://www.postgresql.org/docs/16/sql-set-transaction.html</a>
My takeaways:<p>> 4.2 Recommendations<p>> The core problem is that MySQL claims to implement Repeatable Read but actually provides something much weaker. We see two avenues to resolve this problem.<p>> The first is to keep MySQL’s behavior as it is, and to clearly document the consistency model “Repeatable Read” actually provides. There is precedent in other databases: PostgreSQL’s Repeatable Read is actually Snapshot Isolation, and exhibits behaviors which violate PL-2.99 Repeatable Read. However, PostgreSQL’s documentation eventually mentions that their Repeatable Read implementation is actually Snapshot Isolation. MySQL could similarly document that their “Repeatable Read” means “Read Committed, plus some sort of guarantees that hold until the transaction writes something, at which point mysteries occur.” A precise characterization of those mysteries would be most welcome.<p>Calling what MySQL's "Repeatable Read" as "Read Committed plus..." would be more confusing as even the simplest repeated read without mutations wouldn't work as expected. The documentation should be more upfront about how MySQL "Repeatable Read" doesn't mean what might be expected. In the meantime keep the "MySQL consistent read documentation"[0] close by.<p>> The second option is to treat these behaviors as bugs and fix them. Jepsen would be delighted if MySQL and other vendors were to commit to providing PL-2.99 Repeatable Read. However, even satisfying the incomplete, ambiguous ANSI definition of Repeatable Read would be an improvement over current affairs.<p>I doubt this would be feasible with the extent of deployment. At scale bug-fixes are bugs in themselves. The best that could be done is to create distinct isolation levels for the existing MySQL-RR and the compliant RR isolation levels <i>(somewhat akin to the utf8/utf8mb4 evolution)</i>.<p>[0] <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html" rel="nofollow noreferrer">https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-re...</a>
What would be helpful is not just comparison to theoretical definition to isolation modes but rather comparison to other popular relational databases - PostgreSQL, MS SQL, Oracle ? Something developers need to mind if they want to assure compatibility
Related: ACIDRain for the repeatable read without explicit "select .. for update" locking gotcha gift that still keeps on giving: <a href="https://news.ycombinator.com/item?id=20027532">https://news.ycombinator.com/item?id=20027532</a>
Serious question. I have this question for, like 20 years already.<p>Why would anyone start a new project with MySQL? Is it really superior in anything?
I'm in industry for 20+ years and as far as I remember MySQL was always the worst and most popular RDBMS at any given moment.