Besides being a super well-written and interesting series technically, the Call Me Maybe blogs have been very revealing as far as different organizations' response to criticism. Especially considering all of the target applications are open source, the project maintainers should be profusely thankful someone has taken the time for such thorough analysis, presumably much deeper than the maintainers themselves appear to have done at least on consistency behavior, to reveal bugs which should ultimately make it that much stronger.
<i>I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.</i><p>Aren't we talking about situations in which a database tracking account balances creates money out of thin air, or vaporizes it unexpectedly?<p>I feel like Aphyr is always at pains to talk about the real-world implications of these findings --- not just how bad they are in sensitive applications, but also the kinds of places you can get away with these "inconsistencies".
Am I missing something or does this not address the main issue the original article raised: The documentation is simply incorrect. It claims to support SNAPSHOT ISOLATION but does not. The company knows this and even this article says the behaviour "is totally expected".<p>Seems like the first response should be to fix the docs and not claim capabilities beyond what's implemented.<p>(Also it was pretty clear from the original article that corrupted data meant the balances were incorrect, not that the file was corrupted like a bad checksum.)
Below are the tweets by Aphyr for the same thing (with language slightly toned down). I never thought I won't find their mention here on HN. :) Nevertheless, precise, to the point:<p>Buncha people giving me <filth> for calling data written through an invariant violation "corrupted state", like somehow it's not garbage.<p>If TCP checksums don't work right we don't call the packet "inconsistent." We call it corrupt. If a disk shuffles your file's bits? Corrupt.<p>I use the word corrupt to emphasize that not only has the system <messed> up, but you have <i>no way to know</i> your data is now <messed> up.
There are many approaches to data consistency, including for example Eventual consistency, not only in databases but in any parallel programming. Different CPU architectures for example for years provided different memory consistency models with trade-offs of performance and being usable for application programmers.<p>It is important however behavior in this case is clearly documented.<p>I think there is decent documentation about Innodb describing how transactions work in Innodb <a href="http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-model.html" rel="nofollow">http://dev.mysql.com/doc/refman/5.7/en/innodb-transaction-mo...</a><p>Galera would benefit having more clear documentation about what data consistency model exactly it provides.<p>At the same time Percona XtraDB Cluster, MariaDB Cluster can be used to built reliable applications assuming you're writing to their consistency model correctly.
I think he's trying to define "corrupted" data as completely inaccessible data (because of file corruption, presumably). However, if I zip up a picture of PaulGraham.jpg and when I unzip it I get one of Iron Man, I think I'd be within my rights to call the data corrupt.
> If you use this in a real life, the more obvious way to write these transactions is:<p>Is it? Do ORMs really do that, or is it one of those "SQL was designed to be used this way, but nobody using SQL read the design documents" cases?
Edit: ignore this, it's addressing a completely different situation, and I clearly didn't read the article well enough. The code in the article writes all of the locations it reads, so true SI ought to keep you safe. My apologies!<p>Yet another edit: huh, it seems that InnoDB in RR doesn't rollback when you write to a row that's been written since you started the transaction. TIL.<p>--------<p>It's worth noting here that (AFAIR) Oracle's 'SERIALIZABLE' (actually SI) level suffers from this exact write skew vulnerability, so MySQL/MariaDB is not alone in this issue. As pointed out, SELECT FOR UPDATE is a commonly used remedy.<p>What it comes down to is that while re-reading data in a given transaction under SI will give you the same result, <i>it doesn't guarantee that the data in the DB itself has stayed constant</i>. If you want to guarantee that the data won't change, you need to lock it.<p>IIRC this also applies to PostgreSQL's REPEATABLE READ level.
A lot of defensive talking around technical terms, mixed with a bunch of typos and topped off with unfair attacks.<p>Not very classy. And the point of Aphyr still stands I think. In default mode it is easy to get corrupt data with Galera Cluster. That InnoDB on a single instance can have the same problem makes it all the more troubling and I'm glad I moved away from MySQL a long time ago.
> Following that conclusion is using Galera cluster may result in “corrupted” data. I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.<p>But Aphyr never once uses the term "corrupted data", or the word "corrupted". If you're going to quote an article, it's important to be precise.<p>This response feels panicked, or at least rushed. And it really misses the point. You can't, or shouldn't, try to explain away these types of findings as irrelevant, or just an issue of semantics. Instead, I'd hope by now that technical folks on the receiving end of a Call Me Maybe analysis would have learned that there's precisely one correct way to respond: acknowledge the faults, clarify relevant documentation, and file (and link to) issues in public issue-trackers that will address the problems.<p>HashiCorp, CoreOS, and arguably Elastic played it correctly. Aerospike, Mesos, and now Percona, didn't. Shame.
How is <i>inconsistent</i> different than <i>just plain wrong</i>?<p>This may be another example of reactions to Aphyr's reports telling about the mental model of the developers.