You don't need a CRDT to build a collaborative experience

236 点作者 zknill超过 1 年前

27 条评论

jitl超过 1 年前

I agree broadly with the article’s position but I think locks are more harmful than helpful. When I was a Quip user (2018) it was super frustrating to get locked out of a paragraph because someone’s cursor idled there. Instead just allow LWW overwrites. If users have contention and your sync & presence is fast, they’ll figure it out pretty quick, and at most lose 1-2 keystrokes, or one drag gesture, or one color pick.Notion is “collaborative” and we don’t use a CRDT for text, it’s all last-write-wins decided by the server. However our LWW texts are individually small - one block/paragraph in size - and adding/moving/removing blocks is intention-preserving if not perfectly convergent.As the article says, the downside for LWW is that “offline” / async collaboration isn’t so great. That’s why we’re working on switching to CRDT for our texts. If you’re interested in bringing CRDTs to a product with a lot of users, consider joining Notion’s Docs team - <a href="https://boards.greenhouse.io/notion/jobs/5602426003" rel="nofollow noreferrer">https://boards.greenhouse.io/notion/jobs/5602426003</a> / @jitl on Twitter / jake@makenotion.com

评论 #38292682 未加载

评论 #38299534 未加载

评论 #38298360 未加载

评论 #38295717 未加载

评论 #38295797 未加载

评论 #38292299 未加载

lewisjoe超过 1 年前

> everyone’s gonna say “but hey, google docs uses operational transform not CRDTs”.. OK yes, but you are not google.Well, google docs works not because they somehow figured out how to converge OT edits with as much precision as CRDTs do, but simply because they have a central server which orders edits anyway and don't need true leader-less convergence.In fact, I agree not many things don't need a CRDT. CRDTs help with mathematical rigidity of convergence when you want true peer-2-peer collaboration which works without any central authority.However, most apps anyway work on top of a central authority (example SaaS apps) so there is no real reason to accomodate all the compexity that comes with CRDT. They might get far with a simpler OT based model + central server based ordering.For example even Figma doesn't call its model a 100% pure CRDT. It's a partial, simpler CRDT implemented with an assumption that there's going to be a server that understands ordering.It's the same with Google Docs. They don't need a CRDT because it's a cloud app after all, which means OT is more convenient with major heavylifting (ordering and conflict handlings) outsourced to the server.

评论 #38295067 未加载

MontagFTB超过 1 年前

One of the main points of this article is to "just use locks", which glosses over a lot of technical complications about locking elements within a document. How long is the lock held? Can it be stolen from a user who has gone offline, or is still online but off to lunch, and we _really_ need to make this change before the presentation in an hour? What if the user comes back online and they have changes, but the lock was stolen - how are those changes reconciled to the document?I am generally in favor of simpler is better, and if there is a way to build a collaborative experience without using CRDTs, then go for it. However, sometimes the cure can be worse than the disease, and solutions like locking may introduce more technical complexity than originally thought.

评论 #38292470 未加载

评论 #38291571 未加载

评论 #38292323 未加载

评论 #38291570 未加载

antidnan超过 1 年前

I don't think you need a pure CRDT either but I think locking and presence is a bit of an oversimplification.LWW is a good place to start, and updating the smallest piece of information possible is the right idea in general but there is a lot more nuance to handling complex applications like a spreadsheet (I'm working on one) and whiteboard apps.Things like reparenting or grouping shapes [1], or updating elements that aren't at the lowest scale like deleting a row or column in a spreadsheet make locking challenging to implement. Do you lock the entire row while I'm moving it? Do you lock the entire group of shapes?With the exception of text editing, the popular libraries like Yjs don't just give you a perfect CRDT out of the box. You still have to construct your data model in a way that enables small scale updates [2], and CRDT libraries and literature are the best source of thinking for these problems that I've found.[1] <a href="https://www.figma.com/blog/how-figmas-multiplayer-technology-works/#syncing-trees-of-objects" rel="nofollow noreferrer">https://www.figma.com/blog/how-figmas-multiplayer-technology...</a>[2] <a href="https://mattweidner.com/2022/02/10/collaborative-data-design.html#case-study-a-collaborative-spreadsheet" rel="nofollow noreferrer">https://mattweidner.com/2022/02/10/collaborative-data-design...</a>

charles_f超过 1 年前

That's true for collaborative experience. Crdts are a mechanism to handle eventual consistency (that's even the preface of the paper). If you assume that said collaborative experience is always online, you don't need them, and "using locks" as you described is probably enough.If you want a mechanism to handle that eventual consistency, it's probably better to reuse their principles rather than reinventing something that will eventually ressemble Crdts.You mentioned "offline first", I think it's probably a good place to pluck that ib <a href="https://www.inkandswitch.com/local-first/" rel="nofollow noreferrer">https://www.inkandswitch.com/local-first/</a>

saqadri超过 1 年前

You may not need CRDT per-se, but building a collaborative experience is so difficult. I worked on collaborative systems for a bit, and also have read a bit about how Figma and Notion do it (this is a good read: <a href="https://www.figma.com/blog/how-figmas-multiplayer-technology-works/" rel="nofollow noreferrer">https://www.figma.com/blog/how-figmas-multiplayer-technology...</a>) -- it's still super hard to get right.This talk by Karri about Linear's "sync engine" is also a good watch: <a href="https://www.youtube.com/watch?v=Wo2m3jaJixU">https://www.youtube.com/watch?v=Wo2m3jaJixU</a>.

iamwil超过 1 年前

> Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand.I guess a couple things:It depends on the CRDT. Some CRDTs grow with the number of replicas and others with the number of events.State-based CRDTs don't need to keep history and don't need causal ordering of messages, but internal bookkeeping grows with the number of replicas. And for large states (like sets and maps), it can be prohibitive to send the state all over the wire for an idempotent merge.That's why in practice, people implement Op-based CRDTs, which makes the trade: in order to send small ops over the wire, we now need causal ordering of messages. To make sure we can sync with replicas long offline, we keep as much history so that they can catch up.There are other variations, such as delta-state based CRDTs that send diffs, and merkle CRDTs, which use merkle data structures to calculate diffs and detect concurrency, which have different growth characteristics.---As for a growing state: Is this actually a concern for devs that aren't using CRDTs for collaborative text? I can see that being an issue with the amount of changes that can happen.But outside of that, lots of data don't grow that fast. We all regularly use Git and it keeps a history of everything. Our disks are huge, and having an immutable record is great for lots of things (providing you can access it).> Opaque state: ...you’re generally left with an opaque blob of binary encoded data.Most CRDT libraries take a document-orientated angle. It assumes that you can contain the entire "unit of work", like a document, inside of a CRDT. However, if your data is more relational, it doesn't quite fit. And while there's immutable data in a CRDT, I do wish it was more accessible and queryable. In addition, being a binary blob, it's not exactly composable. I think CRDT libraries should be composable with each other.

earthboundkid超过 1 年前

I've seen locks used at the CMSes of large news organizations. It's fine, but they all need a mechanism to kick out an editor who has an idle tab left open. For my own small scale CMS, I just wrapped Google Docs and let them handle all the syncing headaches.

chromatin超过 1 年前

We took a super simple (IMO) approach to collaborative editing in my current project:Each block of text has a version number which must be incremented by one by the client at the time of submission. The database provides conflict prevention by uniqueness constraint which bubbles up to the API code. The frontend is informed of conflict, so that the user can be notified and let the human being perform conflict resolution.Because most concurrent users are working on different blocks, this works great.

评论 #38295057 未加载

namelosw超过 1 年前

That's not gonna work for real-world projects. Real-world apps often have larger edits than locking individual cells/cards e.g. Move columns or replace large chunks of spreadsheets in Google Sheets, or Ctrl-A to select all and then drag to move.Also, if you consider latency, locking does not work well because client B might do operations before he/she even acknowledges the lock from client A because of latency.

lmm超过 1 年前

> You can’t inspect your model represented by the CRDT without using the CRDT library to decode the blob, and you can’t just store the underlying model state because the CRDT needs its change history also. You’re left with an opaque blob of data in your database. You can’t join on it, you can’t search it, you can’t do much without building extra features around that state blob.So use the CRDT library when building your indices? Or better yet use a CRDT-aware datastore. This doesn't seem like a real problem.> Locking for safetyPlease don't. You're inevitably going to have lost locks, lost updates, or most likely both.

评论 #38296115 未加载

spion超过 1 年前

For offline first apps, or for applications where very high degree of control for the content is needed (e.g. legal docs) and realtime collaboration isn't that valuable, there is also the option to use 3-way merge instead.The benefit is that you can even allow the user to resolve conflicts in a satisfactory way.Another benefit is that the document doesn't even have to be derived from the original, it could go through exports and re-imports and it will still be possible to run a 3-way merge as long as a common base version is declared. This can be especially covnenient for systems that involve e.g. MS Word.

mweidner超过 1 年前

> Opaque State: [...] You can’t inspect your model represented by the CRDT without using the CRDT library to decode the blob, and you can’t just store the underlying model state because the CRDT needs its change history also. You’re left with an opaque blob of data in your database.As someone who works on a CRDT library with opaque state [1], I agree that this is a big barrier to adoption. Features like partial loading, per-paragraph permissions, and accept/reject suggestions seem pretty easy to implement if each text char is just a row in your server's DB, but I would have trouble implementing them on top of e.g. Yjs.For text editing, one idea is to separate the CRDT "positions" from the text itself, which you can then store as a map (position -> char) in your own data structures. I've made a simple (but inefficient) library along these lines [2] and would be interested in ideas for further development.[1] Collabs - <a href="https://collabs.readthedocs.io" rel="nofollow noreferrer">https://collabs.readthedocs.io</a>[2] position-strings - <a href="https://www.npmjs.com/package/position-strings" rel="nofollow noreferrer">https://www.npmjs.com/package/position-strings</a>

matlin超过 1 年前

I think the most important part of designing collaborative software, which this touches on a bit, is having a the right granularity and scope of a given change.Last-writer-wins is only bad when the granularity of what you're editing is too big. E.g. if you're an editor like Figma and each element is a row in a database, a single row is too big. Instead you want attribute level granularity so two users can change the independent properties (like one color and the other size) without bulldozing each other.The other key thing (that's also a common mistake) is to only consider realtime collaboration. In practice, there's always some delay (maybe just milliseconds but could be be hours or days) in how events propagate so solutions like locking don't work.The reality is that any client-server system that needs to be highly interactive and robust to unreliable network conditions is undeniably a distributed system and therefore warrants using distributed system solutions like vector clocks, Lamport timestamps, CRDTs, etc.Last thing is that I think many people only think of operation-based CRDTs when they think about CRDTs. You can (and we have at my company) created a fairly traditional feeling database that relies on a state-based CRDT solution that doesn't need to maintain a log of every operation that has every happened.So yes, you might not need to reach for a fancy library like Yjs or Automerge, but it's worth understanding how these things thinks basically work because many of them are extremely simple and easy to grok - the complicated parts of Yjs and Automerge are the sophisticated data-structures and algorithms that are pretty much only needed for large document text editing.

socketcluster超过 1 年前

The no-code serverless platform I built achieves this behind the scenes via a real-time CRUD API: <a href="https://saasufy.com/" rel="nofollow noreferrer">https://saasufy.com/</a>The key is to perform updates on fields individually. Normally this would not be viable using HTTP due to headers/overheads (too many fields per resource to dedicate an entire HTTP request per-field) but it is viable over WebSockets as each frame is very lightweight and can even be batched. Also, being able to tie together the life of the connection to the subscription is handy to ensure that no real-time updates can be missed.I built a chat app with authentication + access control with it (you can log in with GitHub at the bottom):<a href="https://saasufy.github.io/chat-app/" rel="nofollow noreferrer">https://saasufy.github.io/chat-app/</a>Only 120 lines of HTML markup (web components), no custom JS. See GitHub repo here for the 'source': <a href="https://github.com/Saasufy/chat-app">https://github.com/Saasufy/chat-app</a>

czx111331超过 1 年前

We are addressing the CRDT downsides mentioned in the article at Loro:- Ever-growing state. This is no longer an issue. With OT-like CRDTs, you can discard unnecessary historical data at any time. This is theoretically feasible, and we are moving towards this goal. - Complex implementation. The complexity is internal within the package, and it's written in Rust, making it universally applicable. - Opaque state. We aim to expose these internal states through improved DevTools, making them easier to control and observe. This is one of the essential steps in enhancing our DX.You can visit our blog to learn more: <a href="https://www.loro.dev/blog/loro-now-open-source" rel="nofollow noreferrer">https://www.loro.dev/blog/loro-now-open-source</a>

maclockard超过 1 年前

Wrote about something similar a while ago <a href="https://hex.tech/blog/a-pragmatic-approach-to-live-collaboration/" rel="nofollow noreferrer">https://hex.tech/blog/a-pragmatic-approach-to-live-collabora...</a>Using a server to tie break and locking has worked pretty well for us

aboodman超过 1 年前

It's true that a CRDT is often not the right thing for a classic client/server application. But this doesn't mean we should just give up on ux and use locking.There are approaches to multiplayer that are client/server native. By leveraging the authoritative server they can offer features that CRDTs can't, while preserving the great ux.I'm partial to server reconciliation:<a href="https://www.gabrielgambetta.com/client-side-prediction-server-reconciliation.html" rel="nofollow noreferrer">https://www.gabrielgambetta.com/client-side-prediction-serve...</a>My product, Reflect, implements server reconciliation as a service. You can learn more about how it works here:<a href="https://rocicorp.dev/blog/ready-player-two" rel="nofollow noreferrer">https://rocicorp.dev/blog/ready-player-two</a>

评论 #38296354 未加载

swyx超过 1 年前

> I’ll run through a bunch of broad categories of applications, and describe how to make use of these features.i love these kinds of taxonomies of apps, because then you can get specific about tech stack choices. just offering a couple more that i've come across in my years:- more prototypical: 7GUIs <a href="https://eugenkiss.github.io/7guis/tasks" rel="nofollow noreferrer">https://eugenkiss.github.io/7guis/tasks</a>- Application holotypes: <a href="https://twitter.com/arthurwuhoo/status/1470489178186170374" rel="nofollow noreferrer">https://twitter.com/arthurwuhoo/status/1470489178186170374</a>

parhamn超过 1 年前

At this point, given the maturity of libraries (I was exploring this recently), I think you'd have to make the case that CRDTs are bad not just "too much".Interfacing with the 'blob' is a real thing (y-js is solving some of this with a rust implementation that has cross language binding) but generally the things they noted (e.g. a Figma canvas) aren't things you commonly do joins across and if you did you'd have an independent indexing store for that functionality.With tools like SyncedStore [1] and HocusPocus [2] you end up with a pretty good, we'll tested, easy to implement base for good collaboration.[1] syncedstore.org[2] github.com/ueberdosis/hocuspocus

speps超过 1 年前

The Wikipedia page for Operational Transformation [1] mentions Differential Synchronization [2] as an alternative, does anyone have any experience with DS?[1] <a href="https://en.wikipedia.org/wiki/Operational_transformation" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Operational_transformation</a> [2] <a href="https://neil.fraser.name/writing/sync/" rel="nofollow noreferrer">https://neil.fraser.name/writing/sync/</a>

评论 #38301811 未加载

andyjohnson0超过 1 年前

Because it doesn't seem to be defined in the article:CRDT = conflict-free replicated data type [1]I keep seeing this acronym and mentally parsing it as CRT, at which point I get very confused.[1] <a href="https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...</a>

iveqy超过 1 年前

Developers has solved this is most VCS by merging changes. The trouble with merging is that it's hard to explain and hard to visualize for non technical users. Once that problem is solved, there's a lot of nice tools to use for collaborative experiences.

insanitybit超过 1 年前

> Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand. There’s a bunch of magic that CRDT library authors are doing with clever compression techniques to make this problem less-bad, but it’s basically in-escapable. The size of your CRDT state is not purely a function of the size of the state the CRDT represents, but also of the number of updates that state has gone through.A) This is only the case for certain CRDTs, such as sets that support deletion - so, if you want Set semantics with deletion support, you need two sets, one to that tracks all deletions and one that tracks all insertions.B) You can garbage collection your sets. They don't have to grow forever.> Complex implementations: CRDTs are easy to implement wrong, so probably don’t roll your own.Personally, I've never done this. I've just added a `merge(&mut self, other: &Self)` method to structs in Rust. Guaranteeing CRDT properties is often trivial, or at least it was in my case.> Opaque state: Because the CRDT has to represent both the underlying state and the updates that led to that stateAgain, this is only if you need specific operations on your CRDTs and if your CRDTs are encoded in specific ways.I've said it before, but a trivial crdt looks like this<pre><code> struct Grows(u64); impl Grows { fn merge(&mut self, other: &Self) { self.0 = max(self.0, other.0); } } </code></pre> et voila? Obviously you lose all intermediary states, but since that is specified to be a negative thing, I just want to be clear that it's often optional.> So maybe you are convinced that CRDTs are not the be-all-and-end-all of collaboration, and that you aren’t in one of the two categories where you probably should use a CRDT, and you’ve made it this far in the post.I am convinced that CRDTs are not the be-all-and-end-all, because Strong Eventual Consistency does not provide strong enough guarantees for all use cases.Once again we have a CRDT article that's about user collaboration, which I find somewhat frustrating because CRDTs can be used in far more places than that, and user collaboration is like the most complicated thing you could ever write since it's all of the problems of a distributed system and then we add humans into the mix. There is no "good" solution to this problem - CRDTs aren't going to solve it, and neither is any other algorithm, because it's not possible to encode every possible state update in a way that never conflicts and is also what a human expects (especially since humans have varying expectations).The algorithm/ approach, as described, seems perfectly fine - it will have edge cases just like CRDTs will. In reality, for such an impossibly complex problem, you're probably going to end up with something really complex to solve it. You're almost certainly going to start adding CRDT-like operations, like "ok technically this user held a lock on X, but the other user performed an operation on X that technically commutes, so we can allow both" to alleviate some of the inherent complexities (and UX issues) with locking.

shove超过 1 年前

As someone on the UX team for a product that just uses locks: LOL. Ok, but the suck index is pretty high.

jes5199超过 1 年前

CRDT is a different paradigm. Ideally we'd use it to replace client-server

评论 #38294133 未加载

评论 #38295703 未加载

ivanjermakov超过 1 年前

Conflict-Free Replicated Data Type (CRDT) is a type of data structure that enables concurrent updates across multiple replicas without the need for coordination between them.

评论 #38299474 未加载

27 条评论

jitl超过 1 年前

评论 #38292682 未加载

评论 #38299534 未加载

评论 #38298360 未加载

评论 #38295717 未加载

评论 #38295797 未加载

评论 #38292299 未加载

lewisjoe超过 1 年前

评论 #38295067 未加载

MontagFTB超过 1 年前

评论 #38292470 未加载

评论 #38291571 未加载

评论 #38292323 未加载

评论 #38291570 未加载

antidnan超过 1 年前

charles_f超过 1 年前

saqadri超过 1 年前

iamwil超过 1 年前

> Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand.I guess a couple things:It depends on the CRDT. Some CRDTs grow with the number of replicas and others with the number of events.State-based CRDTs don't need to keep history and don't need causal ordering of messages, but internal bookkeeping grows with the number of replicas. And for large states (like sets and maps), it can be prohibitive to send the state all over the wire for an idempotent merge.That's why in practice, people implement Op-based CRDTs, which makes the trade: in order to send small ops over the wire, we now need causal ordering of messages. To make sure we can sync with replicas long offline, we keep as much history so that they can catch up.There are other variations, such as delta-state based CRDTs that send diffs, and merkle CRDTs, which use merkle data structures to calculate diffs and detect concurrency, which have different growth characteristics.---As for a growing state: Is this actually a concern for devs that aren't using CRDTs for collaborative text? I can see that being an issue with the amount of changes that can happen.But outside of that, lots of data don't grow that fast. We all regularly use Git and it keeps a history of everything. Our disks are huge, and having an immutable record is great for lots of things (providing you can access it).> Opaque state: ...you’re generally left with an opaque blob of binary encoded data.Most CRDT libraries take a document-orientated angle. It assumes that you can contain the entire "unit of work", like a document, inside of a CRDT. However, if your data is more relational, it doesn't quite fit. And while there's immutable data in a CRDT, I do wish it was more accessible and queryable. In addition, being a binary blob, it's not exactly composable. I think CRDT libraries should be composable with each other.

earthboundkid超过 1 年前

chromatin超过 1 年前

评论 #38295057 未加载

namelosw超过 1 年前

lmm超过 1 年前

评论 #38296115 未加载

spion超过 1 年前

mweidner超过 1 年前

matlin超过 1 年前

socketcluster超过 1 年前

czx111331超过 1 年前

maclockard超过 1 年前

aboodman超过 1 年前

评论 #38296354 未加载

swyx超过 1 年前

parhamn超过 1 年前

speps超过 1 年前

评论 #38301811 未加载

andyjohnson0超过 1 年前

iveqy超过 1 年前

insanitybit超过 1 年前

> Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand. There’s a bunch of magic that CRDT library authors are doing with clever compression techniques to make this problem less-bad, but it’s basically in-escapable. The size of your CRDT state is not purely a function of the size of the state the CRDT represents, but also of the number of updates that state has gone through.A) This is only the case for certain CRDTs, such as sets that support deletion - so, if you want Set semantics with deletion support, you need two sets, one to that tracks all deletions and one that tracks all insertions.B) You can garbage collection your sets. They don't have to grow forever.> Complex implementations: CRDTs are easy to implement wrong, so probably don’t roll your own.Personally, I've never done this. I've just added a `merge(&mut self, other: &Self)` method to structs in Rust. Guaranteeing CRDT properties is often trivial, or at least it was in my case.> Opaque state: Because the CRDT has to represent both the underlying state and the updates that led to that stateAgain, this is only if you need specific operations on your CRDTs and if your CRDTs are encoded in specific ways.I've said it before, but a trivial crdt looks like this<pre><code> struct Grows(u64); impl Grows { fn merge(&mut self, other: &Self) { self.0 = max(self.0, other.0); } } </code></pre> et voila? Obviously you lose all intermediary states, but since that is specified to be a negative thing, I just want to be clear that it's often optional.> So maybe you are convinced that CRDTs are not the be-all-and-end-all of collaboration, and that you aren’t in one of the two categories where you probably should use a CRDT, and you’ve made it this far in the post.I am convinced that CRDTs are not the be-all-and-end-all, because Strong Eventual Consistency does not provide strong enough guarantees for all use cases.Once again we have a CRDT article that's about user collaboration, which I find somewhat frustrating because CRDTs can be used in far more places than that, and user collaboration is like the most complicated thing you could ever write since it's all of the problems of a distributed system and then we add humans into the mix. There is no "good" solution to this problem - CRDTs aren't going to solve it, and neither is any other algorithm, because it's not possible to encode every possible state update in a way that never conflicts and is also what a human expects (especially since humans have varying expectations).The algorithm/ approach, as described, seems perfectly fine - it will have edge cases just like CRDTs will. In reality, for such an impossibly complex problem, you're probably going to end up with something really complex to solve it. You're almost certainly going to start adding CRDT-like operations, like "ok technically this user held a lock on X, but the other user performed an operation on X that technically commutes, so we can allow both" to alleviate some of the inherent complexities (and UX issues) with locking.

shove超过 1 年前

As someone on the UX team for a product that just uses locks: LOL. Ok, but the suck index is pretty high.

jes5199超过 1 年前

CRDT is a different paradigm. Ideally we'd use it to replace client-server

评论 #38294133 未加载

评论 #38295703 未加载

ivanjermakov超过 1 年前

Conflict-Free Replicated Data Type (CRDT) is a type of data structure that enables concurrent updates across multiple replicas without the need for coordination between them.

评论 #38299474 未加载