Great post. This one always brings a smile to my face:<p>> Every component is crash-only<p>I was part of the team that developed a distributed, five-9's control system for an industry where downtime costs millions per minute and comes with a federal investigation if long enough. On top of that, the industry is made up of competitors that explicitly distrust each other, so all components had to be <i>truly</i> distributed, with no central coordination for anything.<p>Given the requirements we decided to explicitly adopt a crash-only approach. Between idempotent operations, horizontal scaling, and fast restart times, we could make failing components not impact SLAs (and we had testing to ensure it).<p>Once it gets out into the field (which because of how risk adverse this industry is, is measured in years), it turns out they <i>really</i> did not like software crashing. They interpreted crashing as bad quality, and no amount of "we do it on purpose to ensure correctness" was going to make them happy.
I enjoyed reading that a lot.<p>> The main advantage of distributed transactions is that they make distributed systems look less distributed by choosing CP, but that inherently trades off availability!<p>This is true, but I suspect that its slightly missing the important thing about transactions. A transaction is an operation that takes the database from one Consistent (ACID "C", you can think about it as "legal under the business logic") state to another Consistent state. Linearizability (CAP "C") isn't enough to do that, because often changes in databases require "take from Bob and give to Alice", or "check Bob and Alice's balance and add the order", neither of which fit well into Linearizability's single-key definition of the world. Allowing developers to think about a stream of operations that moves the databases from one legal state to another is super powerful. The whole point is that it provides an abstraction that hides concurrency (ACID "I") and partial failure (ACID "A"). Saving developers from reasoning about those is a big win!<p>> I should also note that while Distributed Transactions might be a useful tool in building idempotency, simply wrapping a non idempotent operation (e.g. “add 1 to X”) in a transaction does not make it idempotent.<p>The OP is right that this isn't a panacea, especially where those transactions aren't idempotent. But transactions are a mechanism to implement idempotence ("insert order number 10 if it isn't there already"), and idempotence and ACID "C" can be really hard to achieve without transactions (or at least "I" and "A").<p>Transactions, CRDTs, and the CALM theorem are linked too. You can definitely have transactions in systems that aren't CAP "C" consistent, and still have them do legal things. The CALM theorem lays out one way to think about those, and CRDTs are a kind of object-oriented embodiment of that theory.
This was a great post and covers many day to day topics that practitioners tend to hand wave over, especially as distributed systems are becoming more pervasive. Dare I say even some of the statements are becoming cliches.<p>The section on distributed transactions could have a little more nuance. Particularly the example about the counter where I suspect any system offering transactions also has a CAS operation. Additionally the benefit of a transaction system is that you can offer bounded counters where as an AP or “strong” EC (CRDTs) system cannot.
One disadvantage of people very familiar with distributed systems is that they might not try hard enough to avoid building unnecessarily distributed systems.
Beyond the pedantic distinction, is there any real point to not calling "at-least-once delivery with idempotent processing" exactly-once processing? I can't imagine that any external observers would be able to tell.
Am I missing why a distributed lock is an impossibility? The problem stated is that a partitioned node can't know it has lost the lock, but this is only an issue if there is a way to lose the lock short of returning it.<p>Which I guess is to say: what difference is there between a lease with an infinite timeout unless manually returned, and a "lock"?<p>Certainly the system deadlocks under partition but I'm not sure why that makes this "impossible".
This article is one of the few I've ever read on distributed systems designs that is even remotely close to what I'd call "correct". The amount of disagreements I've had with colleagues when I propose crash-only modes of operation is unbelievable.<p>Follow the guidelines in this post and you'll indeed result in (more) robust systems. Great writeup.
Loved this!<p>And here are a few more positive phrases: "total order", "committed/uncommitted log", "replicated state machine", "logical timestamp", "the network is not homogeneous", "bandwidth is not infinite", "tail latency tolerance", "fault model", "message passing", "recovery", "cascading failure", "metastable", "strict serializability", and "stable storage"!<p>Surprisingly, at least to me, it's jarring to hear the phrase "CAP" because the field is far more nuanced. That "C" alone has so many different flavors! Far more interesting to talk about the FLP result.<p>Watch out also for "RPC" because that's not isomorphic with optimal consensus protocols, where messages follow multi-path routing (A->C->B->A). Granted, RAFT uses the term "RPC" (A->B->A) but that's more a concession to accessibility and not the broadest way to think of distributed systems, or necessarily the best way to implement them. Message passing is simpler and more realistic as it helps you to see and think about the network/process fault model more clearly.<p>Distributed testing techniques are also moving towards autonomous deterministic testing, as pioneered by FoundationDB, where the database itself is the test simulator—these tend to get into more interesting state spaces that can also be replayed instantly from a seed, compared to external test harnesses that run for hours in real time and that can't reproduce distributed bugs deterministically in exactly the same way every time.
> database vendors might try just a little harder to tell the truth...<p>Come on, you know that's not what's going to happen. If they notice at all, they'll just incorporate the magic phrases into their BS so you have to hunt harder for a real signal.
I'm a newbie and a little confused. On one hand there are posts like this that claim exactly-once delivery and distributed locks are impossible. But on the other hand, if I look at the docs of a distributed database, say Apache Ignite, they will say that they have exactly-once delivery [1] and distributed locks [2]. So ... which is it?<p>[1] <a href="https://ignite.apache.org/docs/latest/key-value-api/continuous-queries#events-delivery-guarantees" rel="nofollow">https://ignite.apache.org/docs/latest/key-value-api/continuo...</a><p>[2] <a href="https://ignite.apache.org/docs/latest/distributed-locks" rel="nofollow">https://ignite.apache.org/docs/latest/distributed-locks</a>
Great post!<p>A few more positive shiboleths.<p>One would be eventual consistency.<p>Another would be discussing write paths vs read paths (or patterns) and recognizing that those can be decoupled (or a mention of CQRS).
It is notable that the linked article [1] is critical of those that don't understand these concepts, then goes on to misunderstand some of the concepts.<p>[1] <a href="https://codahale.com/you-cant-sacrifice-partition-tolerance/" rel="nofollow">https://codahale.com/you-cant-sacrifice-partition-tolerance/</a>
For those curious, here's the origin of the word "Shibboleth"<p><a href="https://www.sefaria.org/Judges.12.6?ven=Tanakh:_The_Holy_Scriptures,_published_by_JPS&vhe=Miqra_according_to_the_Masorah&lang=bi" rel="nofollow">https://www.sefaria.org/Judges.12.6?ven=Tanakh:_The_Holy_Scr...</a>
I liked the article a lot.<p>I prefer the term retryable to idempotent. If there's a failure in the first call, to be truly idempotent it should fail on the second.<p>Retryable on the other hand is easier to argue about. Important thing is not the response but the end state of the system.