From 2009 to 2012 I had a distributed database startup that competed with MongoDB. We used Paxos for replication and built the database with on-disk consistency guarantees --- like the ones this article looks for and rightly obsesses over --- in mind.<p><a href="https://github.com/scalien/scaliendb" rel="nofollow">https://github.com/scalien/scaliendb</a><p>Outcome: you've never heard of ScalienDB; MongoDB brilliantly won by winning the hearts and minds of hackers and coders who don't care about such issues, but were able to get started quickly with Mongo (and got cool free cups at meetups). It turns out that's most engineers out there, definitely the initial critical mass to target for a database startup like Mongo.<p>Btw. the story behind Oracle is similar: early versions were basically write-only; read Ellison's book 'Softwar'. Of course there are other ways to get started: for example DBs coming out of academic research like Vertica seem to avoid this problem; in that case initial funding is basically provided by the gov't and when they create the company to commercialize they're already shooting for Enterprise contracts, skipping the opensource/community building phase of Mongo.
The most interesting lessons from the Jepsen series:<p>* You should never trust, and always verify, the claims made by database manufacturers.<p>* Especially when those claims relate to data integrity.<p>* Super-especially when every safety level provided by the manufacturer that includes the word "SAFE" is actually unsafe.
Mongo absolutely nailed creating a database that is easy to get started with and even do things that are traditionally more 'hard' such as replication. It is still super attractive for me to pick it up for small projects, even after dealing with its (many) pain points both in development and operational settings.<p>Given this, it is so tragic to see how dismissive they have been in regards to the consistency issues that have plagued the db since the early days. Whether it was the stupidity of bad defaults in drivers to not confirm writes, or easily corruptible data in the 1.6 days, or now with not seriously looking at the results of jepsen, the mongodb organization has never taken the issues head on. It would be so refreshing to see more transparency and admitting to the faults rather than wiggling around them until eventually pushing a fix buried in patch notes.<p>I often feel like a mongodb apologist when I admit that I don't mind using mongo for small (and not important) projects and while the mongodb hate can be a bit extreme at times, the companies treatment of these sorts of issues may justify some of it.
There's a lot going on here, but the summary is: "What Mongo actually does is allow stale reads: it is possible to execute a WriteConcern=MAJORITY write of a new value, wait for it to return successfully, perform a read with ReadPreference=PRIMARY, and not see the value you just wrote."<p><a href="https://jira.mongodb.org/browse/SERVER-17975?focusedCommentId=892980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-892980" rel="nofollow">https://jira.mongodb.org/browse/SERVER-17975?focusedCommentI...</a>
Question: How do I actually run Kyle's tests to see this for myself? (Not that I don't believe him, I just want to play around a bit.)<p>When I run `lein install` and then `lein test`, I get:<p><pre><code> ╰─▶ ψ lein test
Exception in thread "main" java.io.FileNotFoundException:
Could not locate jepsen/db__init.class or jepsen/db.clj on classpath: ,
compiling:(mongodb/core.clj:1:1)
at clojure.lang.Compiler.load(Compiler.java:7142)
at clojure.lang.RT.loadResourceScript(RT.java:370)
at clojure.lang.RT.loadResourceScript(RT.java:361)</code></pre>
People really underestimate the value of Occasional Consistency. Occasionally Consistent databases, like MongoDB, are great for approximation algorithms, sublinear time algorithms, and similar applications.
Since Postgres added a JSON type and Docker made running it simple in development, I haven't had a need for anything else. Call me old school, but I prefer starting with a relational database and changing when it's no longer appropriate.
So what should users of MongoDB do? I'm asking because it is the main database used in Meteor and I'm very interested in Meteor.<p>Should the general advice just be "store in MongoDB everything that doesn't require consistency and use Postgresql for everything else"?
I still don't get it. MongoDB can't possibly call itself a database. I can understand MongoScratchStorage, MongoPorbabilisticDataEngine but not MangoDB.
Another instance of Kyle's amazing research! You may want to catch him on stage with other great minds at dotScale on June 8: <a href="http://dotscale.io" rel="nofollow">http://dotscale.io</a>
I seem to remember from a foundationDB talk that they first spent two years building a simulation environment to control everything from network to persistance for testing scenarios.<p>Does anyone know of any open-source project that would aim at doing the same, so that future NoSQL DB can finally be built on strong foundations ?
I knew something was funny with Mongo when all the api calls defaulted to writes not being guaranteed to sync to disk. Maybe for a use case like aggregate statistics gathering it would be ok to risk missing a few updates in a crash for the sake of speed, but to make that the default??
I must admit, I always feel like I am missing something in these discussions. Like I didn't get some memo... I just don't expect a DB like MongoDB to guarantee consistency. The whole story around NoSQL and the likes was to enable crazy horizontal scaling needed for the web. Phrases like "eventual consistency" flew around. It seems so logical - you lose consistency, gain scalability.<p>But somehow, people simply started using them everywhere? Assuming that these DBs are just like any other? And now, we're all bashing on MongoDB because it is - not consistent? What happened here? :)<p>NB that I do not wish to attack the OP - if MongoDB now claims to be consistent in any way, that deserves scrutiny. And these analyses are always a really interesting read. But the general tone in the developer community about MongoDB seems a bit irrational.
Does anyone have any references on how you <i>could</i> write a distributed database that met all ACID properties? Surely there's an academic paper that says that if you do A then B then C, you are guaranteed a certain level of consistency.<p>We've developed a type of distributed database at my company, and I think it's pretty solid, but I need a broader familiarity with the available theory.
Shame this wasn't done with the latest version 3.0. Although given that improvements are scheduled for 3.1 I would imagine it might be still an issue.<p>Nice writeup either way though. Would like to see a similar article for Couch* and MySQL.