John Ousterhout [0], Raft's co-inventor, did a user study (to hammer home the point that <i>raft</i> is easier to understand than <i>paxos</i>), and those presentations, presumably filmed to explain the two protocols to the users part of the study, are up on YouTube.<p>Paxos: <a href="https://www.youtube-nocookie.com/embed/JEpsBg0AO6o" rel="nofollow">https://www.youtube-nocookie.com/embed/JEpsBg0AO6o</a><p>Raft: <a href="https://www.youtube-nocookie.com/embed/YbZ3zDzDnrw" rel="nofollow">https://www.youtube-nocookie.com/embed/YbZ3zDzDnrw</a><p>Then there's this presentation-animation, too, which I find to be pretty nifty: <a href="http://thesecretlivesofdata.com/raft/" rel="nofollow">http://thesecretlivesofdata.com/raft/</a><p>[0] <a href="https://web.stanford.edu/~ouster/cgi-bin/home.php" rel="nofollow">https://web.stanford.edu/~ouster/cgi-bin/home.php</a>
Raft has spawned a huge ecosystem of consensus libraries across various languages. This has 'democratized' the consensus algorithm and has made it easier to build distributed systems.<p>Is it perfect? No, as another comment here points out, Raft might not be as fast as some other more complicated Paxos variants but that's okay for most use cases and is much easier to reason about.<p>To give an example of how powerful this is, I started implementing high availability for an opensource instant search engine I've been working on (<a href="https://github.com/typesense/typesense" rel="nofollow">https://github.com/typesense/typesense</a>) and was able to get a basic clustering solution working in just a few days.<p>Pre-raft this would have taken a few weeks or would have required an external coordinator like Zookeeper. This way, Raft has unlocked so many opportunities.
I'll have to admit that over time I've ended up with mixed feelings about this paper. This is mainly due to people reading this paper without knowing much about consensus and drawing conclusions like "Raft is better than Paxos" or "Raft is the best consensus algorithm" though. Some thoughts (please elaborate if you think I'm simplifying it too much or if you disagree with me!):<p>First of all remember that Paxos is <i>a family of protocols</i> for solving consensus. When doing research it's useful to reduce a problem into smaller and smaller parts. The "standard" Paxos algorithm is a very simple consensus algorithm which can only decide a single value once. It's not practical at all, but provides a good framework for thinking about consensus.<p>When this article proposes "Raft vs Paxos" they are <i>actually</i> comparing Raft against a standard way of configuring Paxos with a leader (MultiPaxos). Note that MultiPaxos allows a lot of nuances in the implementations while still being called "MultiPaxos". MultiPaxos is not a spec you implement; it's a set of ideas.<p>Raft on the other hand is a concrete protocol with well-defined, specified behavior. In fact, Raft is essentially an implementation of MultiPaxos[1]. This is a very good thing! Paxos provides a framework for thinking about consensus, while Raft puts some of these ideas into a concrete specification which is easy to implement. And it is a good point that we should make the knowledge in the field of consensus available for a wider audience. Yay, Raft is good!<p>And here comes the problem: A lot of people have read the Raft paper and made the conclusion that "Raft is the best way of solving consensus". Raft is (relatively) easy to implement and get started with and gives you a very simple model to program for (a log of commands), but it's far from a panacea.<p>The most important thing to know about Raft is that it's not performant (every command has to be sent to a single leader which becomes a bottleneck) nor scalable (every command needs to be processed by all nodes). Etcd supports "1000s of writes" and recommends up to "7 nodes".<p>This doesn't mean that Raft is <i>bad</i>; it's just a trade-off you need to be aware of. Simplicity vs performance. If you're integrating Raft into your stack and aim for scalability/performance you must always be very weary of when you use it. You should minimize writes at all costs. Unfortunately many developers gets the impression that you can just plug Raft into an existing system and suddenly have a performant and scalable distributed system.<p>A good example is CockrouchDB: They're using plain Raft for writes, but uses "leader leases" for scaling reads. Suddenly things become a lot more complicated (for instance see this issue about how leader leases are implemented in the Go library for Raft: <a href="https://github.com/hashicorp/raft/issues/108" rel="nofollow">https://github.com/hashicorp/raft/issues/108</a>). I'm sorry, but you're going to have to get your hands dirty if you want something that's both fast and correct.<p>The end result is that you have two choices: (1) You can use a library which provides a simple model (a log of commands), but doesn't scale well or (2) you can use a more complicated consensus algorithm and then deal with all of the Hard Problems™ that comes with it. If you're going for the second option, you might as well take advantage of all of the research discovered in the last few years (see <a href="https://vadosware.io/post/paxosmon-gotta-concensus-them-all/" rel="nofollow">https://vadosware.io/post/paxosmon-gotta-concensus-them-all/</a>)<p>It should also be noted that even though the <i>consensus</i> algorithm doesn't scale, it doesn't mean your system can't scale. Scalog (<a href="https://www.usenix.org/system/files/nsdi20-paper-ding.pdf" rel="nofollow">https://www.usenix.org/system/files/nsdi20-paper-ding.pdf</a>) is an example of a system which uses a consensus algorithm in a constant way (i.e. regardless of load of the system). Once again: Focus on how you can <i>avoiding</i> using a consensus algorithm due to the way <i>your</i> system works.<p>TL;DR: Hard problems are still hard. Don't think that Raft is a magical piece of software which solves all of your problems.<p>[1]: There are some nuances between Raft and "standard" MultiPaxos as mentioned in <a href="https://arxiv.org/abs/2004.05074" rel="nofollow">https://arxiv.org/abs/2004.05074</a>. I would still consider Raft to be in the same class as MultiPaxos (compared to other solutions of consensus).
Bryan Ford's Threshold Logical Clocks[0] is a great read.<p>[0] <a href="https://arxiv.org/pdf/1907.07010.pdf" rel="nofollow">https://arxiv.org/pdf/1907.07010.pdf</a>
A pet peeve I have about this space is that "consensus" is not actually a practically interesting problem. The interesting problem for applications is "atomic broadcast" or "total order broadcast", which is theoretically equivalent to consensus but much closer to the needs of applications. The result has been that many important atomic broadcast algorithms have been ignored while Paxos and Raft steal the spotlight in "distsys" pop culture. Some of these algorithms need only f+1 nodes to tolerate f faults (rather than 2f+1 as with Paxos or Raft), at the cost of requiring an external reconfiguration service (almost certainly Paxos- or Raft-based). Two examples: chain replication (relatively well-known and widely deployed in at least one major cloud provider), and LCR, a provably write-throughput-optimal, fully symmetric ring-overlay protocol that achieves total ordering with no sequencer. The latter can serve linearizable reads from any replica, with none of the complex consistency concerns of read replicas in Paxos/Raft (or Zookeeper). Also, read throughput scales linearly with the size of the cluster (while write throughput stays roughly constant). Write latency and transient fault tolerance are strictly inferior to quorum-based protocols like Paxos/Raft, but this is less of a concern in LAN environments, where write throughput tends to matter more. Both chain replication and LCR support much higher write throughput than Paxos/Raft, and are thus more suitable for data plane applications in a LAN (specialized Paxos variants like SDPaxos are more appropriate for a WAN). If your system needs to be able to reconfigure itself, though, or you need to mask tail latency/transient faults, then Paxos/Raft are still the way to go.