I found this older introduction to be pretty good and part of a series:
<a href="https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f" rel="nofollow">https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2...</a><p>Besides the use of LSM trees in RocksDB and leveldb-like databases, there is also the WiscKey approach (<a href="https://www.usenix.org/node/194425" rel="nofollow">https://www.usenix.org/node/194425</a>) that helps read/write amplification by keeping the LSM tree small and mostly removing the values to a separate value log. There's a pure Go implementation of the WiscKey approach used by dgraph: <a href="https://github.com/dgraph-io/badger" rel="nofollow">https://github.com/dgraph-io/badger</a>
If you just need to fetch values by a key, for the main storage (might be even as simple as generated by a sequence generator) you can even avoid the asynchronous background compaction overhead and thus unpredictable read- or write-peaks and so on by hashing the keys if it's not already an integer/long based identifier: Basically storing a persistent (both on-disk persistence as well as in the functional sense immutable) hash array based trie. This can easily be extended to store a new revision through copy-on-write semantics. Instead of storing whole page snapshots however, storage advanced now permit fine granular access to your data. Thus you can basically apply lessons learned from backup systems to version the data pages itself and even improve on that.<p>Disclaimer: I'm the author of a recent article about a free open source storage system I'm maintaining, which versions data at it's very core: "Why Copy-on-Write Semantics and Node-Level-Versioning are key to Efficient Snapshots": <a href="https://hackernoon.com/sirix-io-why-copy-on-write-semantics-and-node-level-versioning-are-key-to-efficient-snapshots-754ba834d3bb" rel="nofollow">https://hackernoon.com/sirix-io-why-copy-on-write-semantics-...</a>
I don't understand the hash index part. I guess that for every segment on disk, you also have a hash table for it, correct? Also, this part doesn't seem to be very memory-efficient