Call Me Maybe: MongoDB Stale Reads

605 pointsby llambdaabout 10 years ago

24 comments

Maroabout 10 years ago

From 2009 to 2012 I had a distributed database startup that competed with MongoDB. We used Paxos for replication and built the database with on-disk consistency guarantees --- like the ones this article looks for and rightly obsesses over --- in mind.<a href="https://github.com/scalien/scaliendb" rel="nofollow">https://github.com/scalien/scaliendb</a>Outcome: you've never heard of ScalienDB; MongoDB brilliantly won by winning the hearts and minds of hackers and coders who don't care about such issues, but were able to get started quickly with Mongo (and got cool free cups at meetups). It turns out that's most engineers out there, definitely the initial critical mass to target for a database startup like Mongo.Btw. the story behind Oracle is similar: early versions were basically write-only; read Ellison's book 'Softwar'. Of course there are other ways to get started: for example DBs coming out of academic research like Vertica seem to avoid this problem; in that case initial funding is basically provided by the gov't and when they create the company to commercialize they're already shooting for Enterprise contracts, skipping the opensource/community building phase of Mongo.

评论 #9420727 未加载

评论 #9419959 未加载

评论 #9420165 未加载

bkeroackabout 10 years ago

If you are a database author and you get a bug report from Kyle, spend a long time thinking about it before closing the issue as invalid.

评论 #9418836 未加载

评论 #9418063 未加载

评论 #9418403 未加载

评论 #9418278 未加载

jxfabout 10 years ago

The most interesting lessons from the Jepsen series:* You should never trust, and always verify, the claims made by database manufacturers.* Especially when those claims relate to data integrity.* Super-especially when every safety level provided by the manufacturer that includes the word "SAFE" is actually unsafe.

评论 #9418087 未加载

评论 #9418027 未加载

addisonjabout 10 years ago

Mongo absolutely nailed creating a database that is easy to get started with and even do things that are traditionally more 'hard' such as replication. It is still super attractive for me to pick it up for small projects, even after dealing with its (many) pain points both in development and operational settings.Given this, it is so tragic to see how dismissive they have been in regards to the consistency issues that have plagued the db since the early days. Whether it was the stupidity of bad defaults in drivers to not confirm writes, or easily corruptible data in the 1.6 days, or now with not seriously looking at the results of jepsen, the mongodb organization has never taken the issues head on. It would be so refreshing to see more transparency and admitting to the faults rather than wiggling around them until eventually pushing a fix buried in patch notes.I often feel like a mongodb apologist when I admit that I don't mind using mongo for small (and not important) projects and while the mongodb hate can be a bit extreme at times, the companies treatment of these sorts of issues may justify some of it.

评论 #9418318 未加载

评论 #9418868 未加载

评论 #9418295 未加载

评论 #9418319 未加载

dantiberianabout 10 years ago

There's a lot going on here, but the summary is: "What Mongo actually does is allow stale reads: it is possible to execute a WriteConcern=MAJORITY write of a new value, wait for it to return successfully, perform a read with ReadPreference=PRIMARY, and not see the value you just wrote."<a href="https://jira.mongodb.org/browse/SERVER-17975?focusedCommentId=892980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-892980" rel="nofollow">https://jira.mongodb.org/browse/SERVER-17975?focusedCommentI...</a>

jamescostianabout 10 years ago

I'm so glad to see the Jepsen series re-instated. Thank you so much Stripe

jxfabout 10 years ago

Question: How do I actually run Kyle's tests to see this for myself? (Not that I don't believe him, I just want to play around a bit.)When I run `lein install` and then `lein test`, I get:<pre><code> ╰─▶ ψ lein test Exception in thread "main" java.io.FileNotFoundException: Could not locate jepsen/db__init.class or jepsen/db.clj on classpath: , compiling:(mongodb/core.clj:1:1) at clojure.lang.Compiler.load(Compiler.java:7142) at clojure.lang.RT.loadResourceScript(RT.java:370) at clojure.lang.RT.loadResourceScript(RT.java:361)</code></pre>

评论 #9418465 未加载

评论 #9418336 未加载

cpksabout 10 years ago

People really underestimate the value of Occasional Consistency. Occasionally Consistent databases, like MongoDB, are great for approximation algorithms, sublinear time algorithms, and similar applications.

评论 #9418218 未加载

评论 #9419550 未加载

geowa4about 10 years ago

Since Postgres added a JSON type and Docker made running it simple in development, I haven't had a need for anything else. Call me old school, but I prefer starting with a relational database and changing when it's no longer appropriate.

评论 #9423805 未加载

ivanbabout 10 years ago

So what should users of MongoDB do? I'm asking because it is the main database used in Meteor and I'm very interested in Meteor.Should the general advice just be "store in MongoDB everything that doesn't require consistency and use Postgresql for everything else"?

评论 #9496756 未加载

评论 #9419342 未加载

评论 #9419774 未加载

评论 #9419663 未加载

评论 #9424136 未加载

评论 #9421534 未加载

rdtscabout 10 years ago

I still don't get it. MongoDB can't possibly call itself a database. I can understand MongoScratchStorage, MongoPorbabilisticDataEngine but not MangoDB.

评论 #9418257 未加载

sylvinusabout 10 years ago

Another instance of Kyle's amazing research! You may want to catch him on stage with other great minds at dotScale on June 8: <a href="http://dotscale.io" rel="nofollow">http://dotscale.io</a>

Kiroabout 10 years ago

This article is too technically advanced for me. As a casual MongoDB user, how do these problems affect me?

评论 #9419788 未加载

评论 #9419137 未加载

评论 #9419074 未加载

bsaulabout 10 years ago

I seem to remember from a foundationDB talk that they first spent two years building a simulation environment to control everything from network to persistance for testing scenarios.Does anyone know of any open-source project that would aim at doing the same, so that future NoSQL DB can finally be built on strong foundations ?

narratorabout 10 years ago

I knew something was funny with Mongo when all the api calls defaulted to writes not being guaranteed to sync to disk. Maybe for a use case like aggregate statistics gathering it would be ok to risk missing a few updates in a crash for the sake of speed, but to make that the default??

评论 #9420877 未加载

评论 #9418288 未加载

lobo_tuertoabout 10 years ago

I think it would be great to see one of these done for RethinkDB :)

评论 #9418170 未加载

bakhyabout 10 years ago

I must admit, I always feel like I am missing something in these discussions. Like I didn't get some memo... I just don't expect a DB like MongoDB to guarantee consistency. The whole story around NoSQL and the likes was to enable crazy horizontal scaling needed for the web. Phrases like "eventual consistency" flew around. It seems so logical - you lose consistency, gain scalability.But somehow, people simply started using them everywhere? Assuming that these DBs are just like any other? And now, we're all bashing on MongoDB because it is - not consistent? What happened here? :)NB that I do not wish to attack the OP - if MongoDB now claims to be consistent in any way, that deserves scrutiny. And these analyses are always a really interesting read. But the general tone in the developer community about MongoDB seems a bit irrational.

评论 #9420443 未加载

评论 #9420417 未加载

评论 #9421855 未加载

agopaulabout 10 years ago

So, now I'm wondering: why is Stripe using Mongo at all? Maybe they are planning to migrate to another DBMS?

ccleveabout 10 years ago

Does anyone have any references on how you could write a distributed database that met all ACID properties? Surely there's an academic paper that says that if you do A then B then C, you are guaranteed a certain level of consistency.We've developed a type of distributed database at my company, and I think it's pretty solid, but I need a broader familiarity with the available theory.

评论 #9418629 未加载

评论 #9418356 未加载

评论 #9419802 未加载

评论 #9419293 未加载

posnetabout 10 years ago

Would the use of wired tiger as a storage engine affect these results?

评论 #9418048 未加载

评论 #9418388 未加载

评论 #9418056 未加载

chatmanabout 10 years ago

Apache Solr has done very well at Jepsen tests.

chucksmartabout 10 years ago

Maybe we should listen to Larry Ellison when he say "gimme my money!"

评论 #9418819 未加载

pjeabout 10 years ago

upvoted for the Look Around You link alone.

评论 #9418132 未加载

threeseedabout 10 years ago

Shame this wasn't done with the latest version 3.0. Although given that improvements are scheduled for 3.1 I would imagine it might be still an issue.Nice writeup either way though. Would like to see a similar article for Couch* and MySQL.

24 comments

Maroabout 10 years ago

评论 #9420727 未加载

评论 #9419959 未加载

评论 #9420165 未加载

bkeroackabout 10 years ago

If you are a database author and you get a bug report from Kyle, spend a long time thinking about it before closing the issue as invalid.

评论 #9418836 未加载

评论 #9418063 未加载

评论 #9418403 未加载

评论 #9418278 未加载

jxfabout 10 years ago

评论 #9418087 未加载

评论 #9418027 未加载

addisonjabout 10 years ago

评论 #9418318 未加载

评论 #9418868 未加载

评论 #9418295 未加载

评论 #9418319 未加载

dantiberianabout 10 years ago

jamescostianabout 10 years ago

I'm so glad to see the Jepsen series re-instated. Thank you so much Stripe

jxfabout 10 years ago

评论 #9418465 未加载

评论 #9418336 未加载

cpksabout 10 years ago

评论 #9418218 未加载

评论 #9419550 未加载

geowa4about 10 years ago

评论 #9423805 未加载

ivanbabout 10 years ago

评论 #9496756 未加载

评论 #9419342 未加载

评论 #9419774 未加载

评论 #9419663 未加载

评论 #9424136 未加载

评论 #9421534 未加载

rdtscabout 10 years ago

I still don't get it. MongoDB can't possibly call itself a database. I can understand MongoScratchStorage, MongoPorbabilisticDataEngine but not MangoDB.

评论 #9418257 未加载

sylvinusabout 10 years ago

Another instance of Kyle's amazing research! You may want to catch him on stage with other great minds at dotScale on June 8: <a href="http://dotscale.io" rel="nofollow">http://dotscale.io</a>

Kiroabout 10 years ago

This article is too technically advanced for me. As a casual MongoDB user, how do these problems affect me?

评论 #9419788 未加载

评论 #9419137 未加载

评论 #9419074 未加载

bsaulabout 10 years ago

narratorabout 10 years ago

评论 #9420877 未加载

评论 #9418288 未加载

lobo_tuertoabout 10 years ago

I think it would be great to see one of these done for RethinkDB :)

评论 #9418170 未加载

bakhyabout 10 years ago

评论 #9420443 未加载

评论 #9420417 未加载

评论 #9421855 未加载

agopaulabout 10 years ago

So, now I'm wondering: why is Stripe using Mongo at all? Maybe they are planning to migrate to another DBMS?