Our in-house DB at Stream also runs on top of RocksDB + Raft. Its amazing just how much faster it is than anything else out there (especially compared to cassandra). Instagram uses rocksdb as storage for Cassandra, Linkedin and pinterest use rocksdb. As soon as you have the time to build your own db using rocksdb you get really finegrained control over performance.<p><a href="https://stackshare.io/stream/stream-and-go-news-feeds-for-over-300-million-end-users" rel="nofollow">https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...</a>
RocksDB is a fork of LevelDB, which was [in]famous for its ease of corrupting data. Did Facebook ever do anything to ensure data wouldn't corrupt, or is that still a common thing operationally? (You find it more at larger scales)<p>Here's an example of how data corruption can suck, with (example) Riak and LevelDB. The leveldb data would corrupt often, which would leave you in a predicament. Say you had 10 nodes with a 3 node replication factor, and the whole cluster is humming away at a decent clip. Now one node's leveldb corrupts, and you have to rebuild it. If you have a huge fuckoff dataset, this can take a while. Now another node goes down. Now only 1 node has the data you need, and 2 nodes are down - so now 8 nodes are doing the work of 10, and if you have any more failures, your data might be gone. Now add replication, which will suck performance and bandwidth away from the regular work. And because it would corrupt so easily & often, there needed to be hash trees to quickly identify what data was corrupt, and then you needed to fix it and rebuild your hash trees. This would also suck away performance. Finally, you can't just add new nodes while rebuilding, because the extra load makes the cluster fall over. And the more nodes, the higher the likelihood of failures.
I noticed that RocksDB is used very often in OLTP scenarios. What's the OLAP equivalent of RocksDB in OLTP world? Apache Parquet? Apache Arrow? What would you use these days to create a high performance OLAP/OLHybridP engine ?
Excellent article, very informative.<p>I just had to chuckle at this:<p>> Non-engineers: in a computer, a move is always implemented as a copy followed by a delete<p>Yeah, that's really gonna help a non-developer understand the article better...
> If you surveyed most NewSQL databases today, most of them are built on top of an LSM, namely, RocksDB.<p>Is this actually true?<p>spark, foundationdb, memsql, nuodb , citus . I am not sure any of these are built on top of rocksdb.<p>Which ones are actually built on lsm?
If someone love LevelDB/RocksDB but want to use a pure-Go implementation, I have good thing about this library:<p><a href="https://github.com/syndtr/goleveldb" rel="nofollow">https://github.com/syndtr/goleveldb</a>
Seems like few features:
1. sstables as different files
2. range delete (which is rare)<p>compared to LMDB (which is faster & more efficient):
<a href="https://symas.com/lmdb/technical/" rel="nofollow">https://symas.com/lmdb/technical/</a><p>Still would be nice to see how LMDB would fare in a complex distributed DBMS (most of them are in rocksdb-type libraries).<p>But LMDB is supposed to stay small. So more features are in a fork: <a href="https://github.com/leo-yuriev/libmdbx" rel="nofollow">https://github.com/leo-yuriev/libmdbx</a>