> During the migration, there were a number of fields that should have been set in Mongo but were not<p>Imagine that...this fascination with schema-less datastores just baffles me:<p><a href="http://draconianoverlord.com/2012/05/08/whats-wrong-with-a-schema.html" rel="nofollow">http://draconianoverlord.com/2012/05/08/whats-wrong-with-a-s...</a><p>I'm sure schema-less datastores are a huge win for your MVP release when it's all greenfield development, but from my days working for enterprises, it seems like you're just begging for data inconsistencies to sneak into your data.<p>Although, in the enterprise, data actually lives longer than 6 months--by which time I suppose most start ups are hoping to have been bought out.<p>(Yeah, I'm being snarky; none of this is targeted at bu.mp, they obviously understand pros/cons of schemas, having used pbuffers and mongo, I'm more just talking about how any datastore that's not relational these days touts the lack of a schema as an obvious win.)
I was reading along and nodding my head until I got to the 1000 line haskell program that handles issues stemming from a lack of consistency.<p>I'm not exactly a SQL fanboy, but maybe ACID is kinda useful in situations like this and having to write your own application land 1000 liners for stuff that got solved in SQL land decades ago isn't the best use of time?
If you're thinking about using Riak, make sure you benchmark the write (put) throughput for a sustained period before you start coding. I got burnt with this.<p>I was using the LevelDB backend with Riak 1.1.2, as my keys are too big to fit in RAM.<p>I ran tests on a 5 node dedicated server cluster (fast CPU, 8GB ram, 15k RPM spinning drives), and after 10 hours Riak was only able to write 250 new objects per second.<p>Here's a graph showing the drop from 400/s to 300/s: <a href="http://twitpic.com/9jtjmu/full" rel="nofollow">http://twitpic.com/9jtjmu/full</a><p>The tests were done using Basho's own benchmarking tool, with the partitioned sequential integer key generator, and 250 byte values. I tried adjusting the ring_size (1024 and 128), and tried adjusting the LevelDB cache_size etc and it didn't help.<p>Be aware of the poor write throughput if you are going to use it.
I find these kinds of stories interesting, but without some feel for the size of the data, they're not very useful/practical.<p>I've heard of Bump, and used it once or twice, but I don't actually know how big or popular it is. If we're talking about a database for a few million users, only a tiny percentage of which are actively "bumping" at any time, it's really hard for me to imagine this is an interesting scaling problem.<p>Ex. If I just read an article about a "data migration" who's scale is something a traditional DBMS would yawn at, the newsworthiness would have to be re-evaluated.
I have decided on wanting to use riak as well. I was wondering if anyone had examples of how they used it with their data model?<p>For example this article mentions "With appropriate logic (set unions, timestamps, etc) it is easy to resolve these conflicts" however timestamps are not an adequate way to do this due to distributed systems having partial ordering. The magicd may be serialising all requests to riak to mitigate this (essentially using the time reference of magicd) in which case they're losing out on the distributed nature of riak (magicd becomes a single point of failure / bottleneck).<p>Insight into how others have approached this would be awesome.
Would be interesting to see a follow up in 6 months or so..<p>It doesn't seem fair to compare [<i>old tech</i>] with [<i>new tech</i>] when you've felt all the pitfalls with one but not the other.
Random thought on proto buffers:
OP is advocating using the "required" modifier for fields and touting it as an advantage in comparison to JSON.
I would move the field value verification logic to the client, because it can cause backwards compatibility problems if you un-require it.