NoSQL databases benchmark: Cassandra, HBase, MongoDB, Riak

102 pointsby teoruizover 12 years ago

16 comments

karterkover 12 years ago

I was part of a large migration that moved a significant amount of data from MongoDB to HBase recently. Along the way, I also spent significant time digging into Cassandra and Riak.I appreciate the effort and intention behind this article, but for all practical purposes, such numbers are really not helpful. From experience, if you really want performance from HBase, you need to spend significant amount of time coming up with the right way to structure your data in HBase. To name a few:* choosing the right row key that optimizes bulk scans* setting optimum client and server caching based on the size of each row* pre-splitting regions, and setting custom region sizesYou will also run into various cluster-related issues. Things don't really scale linearly as you add more nodes. You need to also consider maintenance, upgrades, backups, replication and so on.If you want to choose a NoSQL (or for that matter any database), spend some time thinking about whether it fits your data model and your own understanding of the technology. Performance is rarely gained by simply switching a few knobs.

jbellisover 12 years ago

Note that the HBase numbers are so good because the client wasn't actually hitting the database: <a href="http://brianoneill.blogspot.com/2012/10/solid-nosql-benchmarks-from-ycsb-w-side.html" rel="nofollow">http://brianoneill.blogspot.com/2012/10/solid-nosql-benchmar...</a>

评论 #4733820 未加载

评论 #4733684 未加载

jegoodwin3over 12 years ago

Not sure I buy the paper's analysis. They don't seem to know much about load testing -- they speak as though throughput is 'load' (rather than the multiprogramming level) and don't computer Bandwidth - Delay products.Also, rather than using M/M/1 or some reasonable analytic model, they deliberately trottled their request rates to hold throughput constant (thereby guaranteeing different loadings for different 'benchmarks')Just reading the first graph, for example, and applying Little's Law, it's pretty evident that Cassandra was loaded more heavily than than the two MySQL systems, with HBASE and Riak trailing.Looks like HBASE and Cassandra lead the pack to me, with different characterists for different purposes.Advice to authors: buy a book by Neil Gunther.

jbellisover 12 years ago

A more rigorous set of tests (including datasets that don't fit in memory, for instance) was presented at VLDB this year: <a href="http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf" rel="nofollow">http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf</a>

评论 #4735956 未加载

HarrisonFiskover 12 years ago

I wonder why they picked MyISAM for MySQL, since basically no one sane is using that anymore. It isn't even the default anymore, so that can't be the excuse."MyISAM caches index blocks but not data blocks."

评论 #4735301 未加载

sbierwagenover 12 years ago

Running benchmarks dependent on I/O speed on AWS, where I/O speed fluctuates wildly depending on who you're sharing the hardware with?Hmm.

评论 #4735603 未加载

mikkomover 12 years ago

What I find most interesting is how bad mongo performance is.

评论 #4734233 未加载

评论 #4734122 未加载

评论 #4735318 未加载

newman314over 12 years ago

The article tries not to draw a conclusion but Cassandra's numbers look pretty good to me...

falcolasover 12 years ago

A note for people doing benchmarks - please post your configurations for the services with your post as well - it would be even better if your tests were posted on github or something similar so others can reproduce your tests and validate/update them.Also, unless you're running all of the tests concurrently, we really need to see your IOPs records during the tests, since a single lagging EBS volume in a Raid-0 array will still negatively affect disk performance (either that or spring for dedicated iops), and thus skew your benchmarks.Making sure you're not getting a spike CPU steal time would be great as well; same reason.

dschiptsovover 12 years ago

Well, the data never leave JVM, that is why it is so "fast" - it never fsyncs.What if JVM instance crash under load? Data lose, but, see, it is not our fault - our code is OK.

评论 #4734978 未加载

评论 #4734455 未加载

评论 #4734480 未加载

评论 #4734485 未加载

opendomainover 12 years ago

How would you compare SQL benchmarks? Oracle, mySQL, SQL Server, Postgresql? You can find any specific use case where one will out-perform the others. A lot of DBAs assume the Oracle is the most powerful, but it is also harder to manage and VERY expensive to run. I know that most NoSQL databases are free, except for support or multiple datacenter usage- but what about the cost for DevOps? Or backup?

评论 #4734910 未加载

xoailover 12 years ago

I guess the conclusion pretty much sums it up. Every db solution has its own advantages and disadvantages. I find many people jump on latest bandwagon and later regret the choice followed by a "Why we moved away from xyz database" post on their blog. Please make sure to analyze your application and future strategy fully before choosing one.

sh_vipinover 12 years ago

Thanks for this post. It was really good comparison.

neyaover 12 years ago

Thank you for posting this. I actually spent a considerable amount of time searching for similar benchmarks a few weeks ago.

bborudover 12 years ago

"After some of the results had been presented to the public, some observers said MongoDB should not be compared to other NoSQL databases because it is more targeted at working with memory directly. We certainly understand this, but the aim of this investigation is to determine the best use cases for different NoSQL products. Therefore, the databases were tested under the same conditions, regardless of their specifics."Oh, great.

nirvanaover 12 years ago

It's a shame they didn't test CouchBase 2.0 (or at least 1.8)... and really kinda silly, given that it is pretty widely used commercially. I think CouchBase may be the most successful NoSQL database, when it comes to commercial installations (for large customers, maybe mongodb has more total customers.)Plus, I think it would have scored very well here.

评论 #4735823 未加载

16 comments

karterkover 12 years ago

jbellisover 12 years ago

评论 #4733820 未加载

评论 #4733684 未加载

jegoodwin3over 12 years ago

jbellisover 12 years ago

评论 #4735956 未加载

HarrisonFiskover 12 years ago

评论 #4735301 未加载

sbierwagenover 12 years ago

Running benchmarks dependent on I/O speed on AWS, where I/O speed fluctuates wildly depending on who you're sharing the hardware with?Hmm.

评论 #4735603 未加载

mikkomover 12 years ago

What I find most interesting is how bad mongo performance is.

评论 #4734233 未加载

评论 #4734122 未加载

评论 #4735318 未加载

newman314over 12 years ago

The article tries not to draw a conclusion but Cassandra's numbers look pretty good to me...

falcolasover 12 years ago

dschiptsovover 12 years ago

Well, the data never leave JVM, that is why it is so "fast" - it never fsyncs.What if JVM instance crash under load? Data lose, but, see, it is not our fault - our code is OK.

评论 #4734978 未加载

评论 #4734455 未加载

评论 #4734480 未加载

评论 #4734485 未加载

opendomainover 12 years ago

评论 #4734910 未加载

xoailover 12 years ago

sh_vipinover 12 years ago

Thanks for this post. It was really good comparison.

neyaover 12 years ago

Thank you for posting this. I actually spent a considerable amount of time searching for similar benchmarks a few weeks ago.

bborudover 12 years ago

nirvanaover 12 years ago

评论 #4735823 未加载