I like this article: while it is for sure not the definitive guide to NoSQL, it is a short description mostly about facts that people new to the field can use to get an idea about what a good candidate could be for initial experimentation, given a defined problem to solve.<p>That said I think that picking the good database is something you can do only with a lot of work. Picking good technologies for your project is <i>hard work</i>, so there is to try one, and another and so forth, and even reconsidering after a few years (or months?) the state of the things again, given the evolution speed of the DB panorama in the recent years.<p>While I'm at it I like to share that in this exact days I'm working at a Redis disk back end. I've already a prototype working after a few days of full immersion (I like to use vacation time to work at completely new ideas for Redis).<p>The idea is that everything is stored on disk, in what is a plain key-value database (complex values are serialized when on disk), and the memory is instead used as an object cache.
It is like taking current Redis Virtual Memory and inverting the logic completely, the result is the same (working set in memory, the rest on disk), but this implementation means that there are no limits on the data you can put into a single instance, that you don't have slow restarts (data is not loaded on memory if not demanded), and there isn't to fork() to save. Keys marked as "dirty" (modified) are transfered to disk asynchronously as needed, by IO threads.<p>If everything will work as I expect (and initial tests are really encouraging) this means that Redis 2.4 will exit in a few months completely killing the current Virtual Memory implementation in favor of the new "two back ends" design, where you can select if you want to run an in-memory DB or an on-disk DB where memory is just an LRU cache for the working set.
Worth adding HBase?<p>Much below Stolen from their overview page (All needs to be confirmed): <a href="http://hbase.apache.org/" rel="nofollow">http://hbase.apache.org/</a><p>WRITTEN IN: Java<p>MAIN POINT: Hadoop Database<p>LICENSE: Apache<p>PROTOCOL: A REST-ful Web service gateway<p>This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.<p>HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:<p>Convenient base classes for backing Hadoop MapReduce jobs with HBase tables<p>Query predicate push down via server side scan and get filters<p>Optimizations for real time queries<p>A high performance Thrift gateway<p>A REST-ful Web service gateway that supports XML, Protobuf,
and binary data encoding options<p>Cascading, hive, and pig source and sink modules<p>Extensible jruby-based (JIRB) shell<p>Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX<p>HBase 0.20 has greatly improved on its predecessors:<p>No HBase single point of failure<p>Rolling restart for configuration changes and minor upgrades<p>Random access performance on par with open source relational databases such as MySQL<p>FOR EXAMPLE: Facebook Messaging Database<p>BEST USE: Use it when you need random, realtime read/write access to your Big Data.
Apples vs Oranges vs Strawberries vs Pineapple vs Grapes<p>Apples usually stay crispy unless baked. Good in pies.<p>Oranges can be sour (or sweet). Do not bake.<p>Strawberries are red. Good in pies, advise against baking.<p>Pineapples are rough on the outside. Good fresh, baked, grilled, fried, debatable on pizza.<p>Grapes come in many colors and sizes. Great fresh or turned into alcoholic beverages.<p>(Not the worst introduction to fruit, but perhaps superficial? Amirite?)
This article is mostly marketing phrases from the websites of the various projects. Sadly, much of it is inaccurate, extremely skewed, or otherwise not useful for the stated purpose of comparing the listed databases.<p>For example, CouchDB having a "Main Point" of "DB consistency" might be the case, as it is for Redis, when there is no replication. In replicated configurations, it is definitely not true. Further, the MVCC is weaker in many ways than in a Dynamo system like Riak as you have no way to influence or discover consistency between replicas.<p>I'm sure folks expert in other systems can identify similar errors in the rest of the post. Can someone explain to me who the target audience is for all these NoSQL comparison articles? They are universally poor, yet universally popular.
My understanding is that in CouchDB you can't guarantee that older versions of documents will still exists (they might be there, but they could have been removed by compaction or not replicated).<p>However, there is a fairly nice way of storing older versions of documents - hold older versions as file attachments on the document. See:<p><a href="http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=[%22Versioning-docs-in-CouchDB%22]" rel="nofollow">http://jchrisa.net/drl/_design/sofa/_list/post/post-page?sta...</a>
What we're missing are similar arricles that go into disadvantages and implications on deployment.<p>Eg I have found out that deploying Tokyo Tyrant in a Rails project requires you to write some sčripts to ensure that things run properly. Also the db size has to be set in configuration in advance.<p>MongoDB OTOH is not designed for a single server environment, has a very small max document size, easily gets corrupted if process is stopped etc.
CouchDB & MongoDB both share one property that this comparison misses (or mentions only in passing).<p>Both are schema free datastores. For me, this is the biggest, most useful difference between them and traditional SQL databases, because it makes things easy that are very, very hard (or inefficient) on an SQL database.<p>It's probably also worth noting that other NoSQL solutions don't share this advantage. For example, Cassandra requires all nodes to be restarted to apply a schema change, which can be quite a big deal.
I think it's a nice closing word from @jzy:<p>A SQL query goes into a bar, walks up to two tables and asks,
"Can I join you?"
"No, but you can enjoy the view."<p>Sorry :)<p>K.
Under protocols you may want to specify MongoDB's as BSON and Cassandra's as Thrift. That would be more helpful than "binary/custom".<p>Updated:<p>Also Redis's main selling point is it's extensive data structure/operations support. "Blazingly fast" really depends on what your workload is and what you're comparing it against.
Also is VertexDB - small graph database. It's written in C, uses Tokyo Cabinet for storing data. Simple http filesystem-like interface. The general advantage - links, that allow to make graph structures on database level.<p><a href="https://github.com/stevedekorte/vertexdb" rel="nofollow">https://github.com/stevedekorte/vertexdb</a>
You mention that some of these solution could be used in the Financial industry. I would be cautious of using these, especially since some are eventually consistent. If you are just tracking data these may be fine though.
I was hoping HandlerSocket would be in here. If you don't know about it, check it out <a href="http://news.ycombinator.com/item?id=1886137" rel="nofollow">http://news.ycombinator.com/item?id=1886137</a>
Interesting and useful.<p>One major feature differentiator is something it doesn't really talk about, though - how conducive is each system to Massive Data?<p>For example, he kind of has a bone to pick with Cassandra, which is probably justified. But from what little I know, one of the features of Cassandra is that it's designed to scale pretty much to infinity. That may be true of a couple of the others, but for some (like CouchDB) it isn't a design goal at all.
Thank's for the article, good information.<p>Does anyone have any user amounts about the different no-sql databases? Or just say two most popular ones? I guess some of them will rise above the other's in following years and some will drop. User amounts would indicate which ones have most potential to stay around and be accepted as standard no-sql databases.
So if Cassandra writes are much faster than reads, why would Reddit go that route? Their comment server is consistently breaking on them, and it would seem that a sub-optimal choice of db might be partly to blame.
Good article, it's a good starting point to let the people to decide where to start in using a NoSQL solution. But what about OrientDB? Do you plan to add it in this feature comparison?