科技回声

6 条评论

lxpz超过 1 年前

As the main author of the Merkle Search Tree paper, I'm really glad to see the design become more known and start to be used. I did not go on and build a practical system that made use of the structure, although I originally came up with the data structure while thinking about how to build decentralized social networks and data stores for collaborative applications. Thankfully, the design itself is generic and can be adpted to many use cases, and it's great to see that people have gone as far as to build independant implementations in Go and Rust. I'd be curious to see how it performs in practice, in systems that have a real-world use, and not in the simplistic synthetic benchmark scenario of the paper. Hopefully it holds up to the promise :)

评论 #39139136 未加载

评论 #39136990 未加载

LAC-Tech超过 1 年前

Normally, this is the job of vector clocks in State-based CRDTsIs the author mistaking vector clocks for version vectors? If so he would not be the first; the famous Amazon Dynamo white paper makes the same mistake when discussing their CRDT implemenetation. But verson vectors and vector clocks are different things.Alternately, I may be ignorant of a class of CRDTs solved with vector clocks.

评论 #39137374 未加载

评论 #39135547 未加载

taylorius超过 1 年前

I think the article would benefit from quickly defining what a CRDT is - (Conflict free Replicated Data Type). Even just expanding the acronym once would give a bit of an idea.

评论 #39134489 未加载

评论 #39135298 未加载

评论 #39136599 未加载

评论 #39137148 未加载

评论 #39135434 未加载

tadfisher超过 1 年前

Prolly trees are currently used (and maintained) in Dolt [1, 2], who dive a little deeper into the implementation.[1]: <a href="https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-table-data/" rel="nofollow">https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl...</a> [2]: <a href="https://docs.dolthub.com/architecture/storage-engine/prolly-tree" rel="nofollow">https://docs.dolthub.com/architecture/storage-engine/prolly-...</a>

评论 #39136025 未加载

评论 #39138345 未加载

michaelmure超过 1 年前

A practical example of a op-based Merkle-DAG CRDT is (I believe) git-bug[1]. Some doc here[2].I originally wrote something akin to an op-based CRDT and enforcing a purely linear series of commits in git's DAG, but eventually found that it doesn't really work in a multiplayer configuration. Eventually, I realized that I could instead have a real DAG to capture the concurrent editions, with "empty commits" as DAG merge operation.The result is more or less what is described in that article, with some nice properties:- written in go, I now have a generic implementation[3][4] that, given a set of operation, can easily support many practical use cases (bug tracker issue is the first, kanban and pull-requests coming).- git itself is taking care of the storage and the synchronization with peers (aka git remotes). I get the full set of upsides described in the article.- unfinished for now, but I can leverage some git construct to crypto sign each interaction with the data structure, to prove authenticity and later construct a proper access right system (who can edit a comment, who has admin right ...).- additionally to the DAG structure, I also have lamport clock to give an order between each independent DAGs (last edited bug ...). They are also used as a help to compute the final order within a DAG if there is ambiguity (concurrent editing).I'm much more an engineer than a researcher, so it'd be awesome to have the opinion of the community and especially iamwil, hecturchi or lxpz.[1]: <a href="https://github.com/MichaelMure/git-bug">https://github.com/MichaelMure/git-bug</a> [2]: <a href="https://github.com/MichaelMure/git-bug/blob/master/doc/model.md">https://github.com/MichaelMure/git-bug/blob/master/doc/model...</a> [3]: <a href="https://github.com/MichaelMure/git-bug/blob/master/entity/dag/example_test.go">https://github.com/MichaelMure/git-bug/blob/master/entity/da...</a> [4]: <a href="https://github.com/MichaelMure/git-bug/tree/master/entity/dag">https://github.com/MichaelMure/git-bug/tree/master/entity/da...</a>

hecturchi超过 1 年前

I'm the main author of the Merkle-DAGs CRDTs and I'm very happy to see this here. I'm a bit sorry now that I used the "Merkle-CRDT" name all over the paper because Merkle-Search-Trees CRDTs also deserve that tag.I think it is good to explain Merkle-DAG CRDTs (MD-CRDTs) as "merklerized "op-based CRDTs", and I would add that Merkle-Search-Tree CRDTs (MST-CRDTs) are akin to "state-based" CRDTs, to complete the analogy. This helps to bridge traditional and Merkle CRDTs and I would probably introduce it the same way, but it is also not quite right when going into some more depth. I will try to explain why.In MD-CRDTs we have a growing-DAG that works like a chain of operations that are applied in the order that the DAG is traversed. It looks a lot like an operation log, except it can have multiple branches and, if we follow my paper, operations do not need to have any total order (unlike op-based crdts). We know the latest operation in the DAG (the root node) happened AFTER its children, so there is an embedded notion of order. However, if it has multiple children, we don't know how to order the operations in their respective branches, and we don't need to because they should be commutative etc. For that to work, the paper explains that DAG-nodes embed not operations, but "state deltas". So in reality, MD-CRDTs are delta-based crdts where the merkle-dag structure stores the deltas as they are broadcasted from replicas. It's just that deltas look a lot like operations.In MST-CRDTs, if my understanding is correct, replicas will be broadcasting the roots of the DAG pointing the full state (similar to state-based CRDTs broadcasting the full state). However, thanks to Merklelization etc., changes to the state only update sections of the full DAG, and diffing, retrieving or broadcasting only the changed parts is super easy and very compact. Since the nodes values will be CRDTs themselves, they can be merged on conflict just fine. In practice, it is sort of also dealing with delta-crdts.The main thing in both is that reconstructing the full state from deltas is easy and efficient because everything is linked, so you just follow links and drop branches that you already have. Deltas without headaches.Perhaps another way to conceptually understand both is that in MD-CRDTs, you add DAG-nodes to the top of the DAG as root DAG nodes, leaving the bottom of the DAG untouched. In MST-CRDTs, you add dag nodes to the bottom (the leaves) of the DAG, which cascades into having new root DAG nodes. In both cases you broadcast new root DAG nodes and the rest of people traverse and sync.Now, what are the practical consequences for implementations:- The main problem in pure MD-CRDTs (with no caveats) is the unbounded growth (ala blockchain) even when the state becomes smaller. This only matters if you plan to delete or overwrite things from the state. In MST-CRDTs it is "easy" to discard orphaned data. However, one of the main advantages in MD-CRDTs is that "syncing" happens from the last operation/delta. This can be important if you care about having certain notion of order, i.e. that "elements added last" are the first ones to be synced into an empty replica. Another advantage may be to be able to reconstruct multiple views of the same state by re-traversing the DAG, which has "everything".- MST-CRDTs are very good for database tables, syncing is very efficient, things are compact. I haven't implemented them myself but they seem quite adequate for that use-case. When thinking of downsides, I wonder if they suffer from the cost of re-hashing operations as they grow with a very large number of keys and with heavy usage (larger blocks to hash or more levels to update). One very important advantage of MST-CRDTs is that they are more resilient if a leave-DAG-node is unretrivable. In MD-CRDTs an unretrievable DAG node can break DAG-traversal and with it a large portion of the state while in MST-CRDTs, this will affect only a key or a set of keys related to the missing DAG node.In the end, for both the devil will be in implementation details: how good is your pubsub code? How good and parallelizable is your update-processing code? How good is your GC process? How good is your blockstore? All those things become performance bottlenecks even when the theoretical properties work perfectly.

评论 #39138437 未加载

评论 #39139139 未加载

6 条评论

lxpz超过 1 年前

评论 #39139136 未加载

评论 #39136990 未加载

LAC-Tech超过 1 年前

评论 #39137374 未加载

评论 #39135547 未加载

taylorius超过 1 年前

I think the article would benefit from quickly defining what a CRDT is - (Conflict free Replicated Data Type). Even just expanding the acronym once would give a bit of an idea.

CRDTs Turned Inside Out

6 条评论

CRDTs Turned Inside Out

6 条评论