>Graviton is currently alpha software.<p>More like the "BTRFS for key-value stores" ;)<p>Kidding aside, I dislike when new unproven software claims the name of industry standards like this. When I saw the headline, I was hoping this somehow actually leveraged ZFS's storage layer, but actually it is just a new database that thinks Copy-on-Write is cool.
Nice!<p>I implemented pretty much the same trade off set in an authenticated storage system.<p>single writer, radix merkle tree, persistent storage, hashed keys, proofs.<p>I guess it is a local maxima within that trade off space.<p>I like how the time travelling/history is always touted as a feature (which it is), but it really just means the garbage collector/pruning part of the transaction engine is missing. Postgres and other mvcc systems could all be doing this, but they don't. The hard part of the feature is being able to turn it off.<p>I'll probably have a look around later, the diffing looks interesting, not sure yet if it's done using the merkle tree (likely) or some commit walking algorithm.
Does anyone know of an embedded key-value store that <i>does</i> do versioning/snapshots, but <i>doesn’t</i> bother with cryptographic integrity (and so gets better OLAP performance than a Merkle-tree-based implementation)?<p>My use-case is a system that serves as an OLAP data warehouse of representations of how another system’s state looked at various points in history. You’d open a handle against the store, passing in a snapshot version; and then do OLAP queries against that snapshot.<p>Things that make this a hard problem: The dataset is too large to just store the versions as independent copies; so it really needs <i>some</i> level of data-sharing between the snapshots. But it also needs to be fast for reads, especially whole-bucket reads—it’s an <i>OLAP</i> data warehouse. Merkle-tree-based designs really suck for doing indexed table scans.<p>But, things that can be traded off: there’d only need to be one (trusted) writer, who would just be batch-inserting new snapshots generated by reducing over a CQRS/ES event stream. It’d be that (out-of-band) event stream that’d be the canonical, integrity-verified, etc. representation for all this data. These CQRS state-aggregate snapshots would just be a cache. If the whole thing got corrupted, I could just throw it all away and regenerate it from the CQRS/ES event stream; or, hopefully, “rewind” the database back to the last-known-good commit (i.e. purge all snapshots above that one) and then regenerate only the rest from the event stream.<p>I’m not personally aware of anything that targets exactly this use case. I’m working on something for it myself right now.<p>Two avenues I’m looking into:<p>• something that acts like a hybrid between LMDB and btrfs (i.e. a B-tree with copy-on-write ref-counted pages shared between snapshots, where those snapshots appear as B-tree nodes themselves)<p>• “keyframe” snapshots as regular independent B-trees, maybe relying on L2ARC-like block-level dedup between them; “interstitial” snapshots as on-disk HAMT ‘overlays’ of the last keyframe B-tree, that share nodes with other on-disk HAMTs, but only within their “generation” (i.e. up to the next keyframe), such that they can all be rewritten/compacted/finalized once the next keyframe arrives, or maybe even converted into “B-frames” that have forward-references to data embedded in the next keyframe.
I love the idea but I think you (author) need a lot of time/support polishing this. You need a team probably.<p>Also,<p>>Superfast proof generation time of around 1000 proofs per second per core.<p>Does this limit in <i>any</i> way things like read/write perfomance or usability in general?
You can run a Graviton database. You can also run a database on a Graviton:<p><a href="https://aws.amazon.com/about-aws/whats-new/2020/07/announcing-preview-for-amazon-rds-m6g-and-r6g-instance-types/" rel="nofollow">https://aws.amazon.com/about-aws/whats-new/2020/07/announcin...</a><p>For best results, run Graviton on a Graviton:<p><a href="https://aws.amazon.com/ec2/graviton/" rel="nofollow">https://aws.amazon.com/ec2/graviton/</a>
Comparison to Badger? Badger is also go-native and, for me, has been exceptional at scale and for read-heavy workloads on SSD.<p>Ref: <a href="https://github.com/dgraph-io/badger" rel="nofollow">https://github.com/dgraph-io/badger</a>
What I'd really like is a multiprocess safe embeddable database written in pure Go. So a database which is safe to read and write from separate processes.<p>Unfortunately I don't think this one is multiprocess safe.