I have built a distributed key-value store on top of Tokyo Tyrant (which is a network interface on top of Tokyo Cabinet). I have also tried to use memcachedb (that's based on Berkley DB) and Tokyo Tyrant outperforms it - especially with large amounts of items. memcachedb has also problems with log (it grows furiously large and db checkpoints stalls the database for several seconds).<p>One can also script Tyrant via Lua (which is pretty amazing) and there's master-master replication. Here's how to implement incr via Lua: <a href="http://paste.pocoo.org/show/103848/" rel="nofollow">http://paste.pocoo.org/show/103848/</a> I have currently extended it with a fixed list and Mikio (the dev of Tokyo *) has implemented an inverted index for search.<p>Currently I have my key-value store in production (it has around 1 million key-values) and the performance is pretty good.<p>I will release it open-source, it's only Python now, but the code is around 200 lines + a consistent hash library implementation ( which I implemented in <50 lines of code <a href="http://amix.dk/blog/viewEntry/19367" rel="nofollow">http://amix.dk/blog/viewEntry/19367</a> ) so it should be possible to port it over.
Sounds impressive and cool. I wonder about Table engine performance though.<p>My experiences with Berkeley DB (using the native C API) were positive, from a performance point of view, largely <i>because</i> BDB has no built-in query support. When I wanted to look something up, and I had no index on the desired value (BDB calls indices "secondary databases"), I always knew I was doing a linear table scan, because I had to write the loop myself. No SQL meant that inefficient queries could not hide in plain sight.<p>Berkeley DB trades off the flexibility of SQL queries for (1) excellent lookup performance in simpler cases, and (2) the ability to mix relational and object-based data storage, reducing the famous impedance mismatch of relational data modeling. For some applications, this trade-off works beautifully. I daresay that it holds true for most web apps which really don't need a relational data store, and either outgrew pure in-memory storage or just sensibly want to persist their state to disk. (Hint: if you use an ORM, then you probably don't want a relational data store.)<p>For some other apps, the lack of SQL queries hurts. I once worked on a trading system implemented on top of a proprietary engine which did not support arbitrary queries. It made writing even trivial reports ("list all transactions done two days ago by trader X using instrument Y") unnecessarily painful. (Of course, on a typical RDB with a non-trivial schema and a large dataset, these arbitrary queries would take 20 minutes to run, so catch-22.)<p>The Table engine for Tokyo Cabinet seems to do everything implicitly. Fast and arbitrary queries. If it works on the kind of schema-less free-form data the author refers to, and has working master-master replication, that sounds almost too good to be true.
For Tokyo Tyrant there's a Python library:
<a href="http://code.google.com/p/pytyrant/" rel="nofollow">http://code.google.com/p/pytyrant/</a><p>It uses the binary protocol and it's written in a very simple and very clean way - and it should be production ready.<p>I would not recommend using the memcached protocol with Tyrant (it offers less operations and it isn't binary).
The article is mostly about the Ruby bindings, but Tokyo Cabinet is written in C99 and has bindings for most of the usual suspects. English language specifications are here:
<a href="http://tokyocabinet.sourceforge.net/spex-en.html" rel="nofollow">http://tokyocabinet.sourceforge.net/spex-en.html</a>
Haven't used it, but Erlang bindings: <a href="http://dukesoferl.blogspot.com/2008/06/tokyocabinet-and-mnesia.html" rel="nofollow">http://dukesoferl.blogspot.com/2008/06/tokyocabinet-and-mnes...</a>
So, this is a BDB clone without the years of testing of BDB? That's exactly what I want for my critical data.<p>I am really confused as to why one would implement this in C. One pointer math problem, and <i>boom</i>, gigs of data are corrupted. If you are going to reimplement BDB, you might as well do it in Haskell or OCaml and buy yourself a little type safety. (Incidentally, this is on my TODO list, but BDB works well enough for us at this point.)
There is also a full-text search system for Tokyo Cabinet called Tokyo Dystopia (<a href="http://tokyocabinet.sourceforge.net/dystopiadoc/" rel="nofollow">http://tokyocabinet.sourceforge.net/dystopiadoc/</a>)..Has anyone written Ruby API for this?<p>Was wondering whether it be a superior option for Rails search ahead of Xapian and Sphinx.