科技回声

This is pretty similar to Sparkey[0] and bam[1]. Sparkey also comes from growing out of cdb's limitations. It supports block-level compression like Riffle does, and is optimized for accepting bulk writes. Riffle's linear-time merge behavior lifted from Sorted String Tables is a nice alternative to accepting writes at runtime. bam is cool in that it takes a plain separated values file as input, and builds an index file from a minimal perfect hash function over the input file.<p>[0]: <a href="https://github.com/spotify/sparkey" rel="nofollow">https://github.com/spotify/sparkey</a> [1]: <a href="https://github.com/StefanKarpinski/bam" rel="nofollow">https://github.com/StefanKarpinski/bam</a>

"While memory-mapping is used for the hashtable, values are read directly from disk, decoupling our I/O throughput from how much memory is available."<p>Whether you're mmap'ing or using read(), you're hitting the page cache before you hit disc, and potentially evicting the LRU page thereof. Glancing through the source it doesn't look like they're using actual "direct IO" (which, in order to be performant, would have to have its own caching layer).<p>That being the case, for lots of tiny reads & writes I'd expect mmap to be superior to read() and write().

Riffle: a high-performance write-once key/value storage engine for Clojure

2 条评论

Riffle: a high-performance write-once key/value storage engine for Clojure

2 条评论