TechEcho

7 comments

lucaspillerover 13 years ago

Thanks for sharing. We have been using Riak for the last year and have had lots of fun learning a few lessons the hard way. Now we have learnt it though, everything seems to be fine - apart from when nodes stop responding when they start merging Bitcask data.I think one of the biggest issues we had was that we saw it as a kind of drop in replacement for MySQL. We originally used the Ripple library (protip: don't) which talked to Riak over the HTTP interface (it now supports Protobuffs, but still don't) and plonked loads of data in, then wrote some lovely MapReduces to get it out. It worked fine, except it was slow. Dog slow.The issue is that to do a MapReduce you need to get a list of keys to process. However in Riak listing the keys of a bucket is slow as it has to walk the whole keyspace (even other buckets) to find the matching keys. I'm guessing this is why Riak Search is slow for you as well. We didn't try it, but if you kept track of the keys and fed these to a MapReduce I suspect it would be a lot faster.Now we are using it as in your epiphany as a persistent distributed version of memcached. We use predictable keys for everything, so just need to do gets and puts. We also found that trying to deal with siblings is a pain, so we turned them off (allow_mult=false, last_write_wins=true). For any important documents that need to be updated, where we can't afford to lose data due to race conditions, we just write a new version with a different predictable key (kind of like a linked list). It works lovely now :)As stated in the article, enterprise support is good, however we have had most of our questions answered rather quickly in IRC or the mailing list. There is usually always somebody around from Basho to help you out. The Riak Recaps are also a great way to keep up to date with what is going on in the community.

lobster_johnsonover 13 years ago

Riak is interesting, but its performance characteristics are unfortunately interesting in a negative sense.First, single-node performance is awful. You would think that even on a single node, simple key/value lookup would be able to compete against something like Postgres; after all, the read pipeline should be much simpler. But it doesn't come close, in my testing. Not just key/value lookups; it's also dog slow at chasing links, and scales very badly with the number of links.Now, Riak is intended to be deployed on many nodes, but in my testing with three nodes I did not see a significant increase in performance. The overhead seemed to be constant. In a master-less, distributed database, some compromise on performance is acceptable, but Riak looks much worse than would be acceptable.Riak is focused on random access. Sequential access similar to that offered by Cassandra and HBase just won't scale beyond a few hundred objects. This problem is made worse by the fact that a Riak "bucket" isn't really a bucket, but a namespace; all data in a Riak database resides in the same storage, and bucket operations simply filter that data based on the bucket name. This means that any sequential access is off the table.The Riak people suggest using links in order to emulate sequential access patterns and entity relationships. For example, if you want to blog post with bunch of comments, you would store the post in one key, and then comments in other keys, and then use child/parent links to glue them together. Unfortunately, this seems to scale very badly with the number of objects linked.Not just read performance, either; it seems links are stored in a single chunk whenever an object stored, so when adding a single child, you have to write all children back to the object! Obviously that does not scale to thousands or even millions of objects. For a database that has no other means to deal with relationships, the weakness of the link feature is serious.After all, you would face the same problem if you stored the list of objects as a key value (you have to rewrite the entire object every time, since Riak does not have partial updates), and since it's not feasible to scan keys based on prefix, for example, you end up with no alternative. I suppose you can use Riak Search, but last I checked it was built on top of Solr, which is not fast.There are various minor annoyances, too. For example, unlike Cassandra, there is no operation to delete the entire contents of a bucket -- something which you want to do before/after unit tests --, so you have to sequentially enumerate the bucket's keys and delete each one, something which is just excruciatingly slow.Here is a test case I wrote in Ruby to test performance: <a href="http://dl.dropbox.com/u/12091499/Share/riaktest.tar.gz" rel="nofollow">http://dl.dropbox.com/u/12091499/Share/riaktest.tar.gz</a>. It contains a script to test similar operations/queries against Postgres. The included Riak test results are for a three-node dedicated test cluster (plenty of CPU, RAM, I/O). I am open to the possibility that the config needs tuning, of course.

joevandykover 13 years ago

I/O - It’s not easy to run Riak on AWS. It loves I/O. To be fair, they say this loud and clear so that’s my problem. We originally tried fancy EBS setup to speed it up and make it persistant. In the end we ditched all that and went ephemeral. It was dramatically more stable for us overall. If you're running any type of a database or service on EC2 where latency is an issue, you should be using ephemeral/instance storage. Not EBS.

评论 #3279312 未加载

someone13over 13 years ago

Unlike a lot of other NoSQL solutions, there's remarkably few "real-world" stories about Riak, so I love hearing about it when I can. It's also interesting to hear about some of the "real-world" problems that they've encountered. I've been looking at Riak for a while now for a project of mine, so these kinds of articles are very welcome!

jstinover 13 years ago

I have used riak in a production app. Just recently switched to redis.Riak was great for the most part. There were three nodes running on three different machines. Setting up a new node is a breeze. Eventually there become a data inconsitancy between the nodes that couldn't get resolved, and would happen intermittently.When this happened, any sort of map reduce operation would fall. Individual keys could be requested, but only from a node that had the item.In addition, removing a node in this inconsitant state was a no go. The other nodes would stay in the inconsitant state for good.Interestingly enough, this only happened on certain buckets. There was a logger bucket that had keys updating and being created all the time that never had an issue.Perhaps it was a bad configuration on our end...

评论 #3279701 未加载

ryanfitzover 13 years ago

I really enjoyed this review, it came off very honest and you didn't try to be very pro or con, just what you ran into over the past year.You mention no reasonable Scala client exists, why not create and release your own? I know Riak talks about offering fulltext search, but one thing I've learned over the years is using a specifically designed search engine (such as lucene/solr/indextank) is typically far more performant/scalable plus easier to implement then a generic datastore. I've never used riak for search so I have a few questions. Did you ever attempt to use a different datastore for search? If you did, at first did riak perform just as well, if not why not?

评论 #3279599 未加载

dabeeeensterover 13 years ago

Great read. One thing that would have been useful would be to know what sort of size dataset they were working with? If you're not working with millions of documents, is the search and link walking still viable?

7 comments

lucaspillerover 13 years ago

lobster_johnsonover 13 years ago

joevandykover 13 years ago

评论 #3279312 未加载

someone13over 13 years ago

jstinover 13 years ago

评论 #3279701 未加载

ryanfitzover 13 years ago

评论 #3279599 未加载

dabeeeensterover 13 years ago

Building an Application upon Riak - Part 1

7 comments

Building an Application upon Riak - Part 1

7 comments