I have a bit of a man-crush on Leslie Lamport (and have (unsuccessfully) tried to get TLA+ to be used at every job I've had for the last five years), and I have read through both the original Paxos paper in addition to the Paxos Made Simple paper, and I'm just going to say that I still found it about five times harder to grok than Raft. Honestly the TLA+ specification of Paxos made it click more than the papers for me, but that could just be because of the repeated readings before.<p>It's a beautiful algorithm, but Raft has the advantage of being a lot simpler.
My grey matter is failing me but does anyone remember a rant about 5-10 years ago about developing software in this era that had some punch line about consensus algorithms that went a little like, "so, there's this guy named Diego?"<p>edit: found it<p><a href="https://circleci.com/blog/its-the-future/" rel="nofollow">https://circleci.com/blog/its-the-future/</a>
Shameless plug of a blog series where I try to list and summarize every paxos variant:<p><a href="https://vadosware.io/post/paxosmon-gotta-concensus-them-all/" rel="nofollow">https://vadosware.io/post/paxosmon-gotta-concensus-them-all/</a>
Lamport has published a TLA+ spec of the Paxos consensus algorithm¹, but none I could find of the Paxos algorithm itself (aka multi-Paxos).<p>In 2016 some academics at Stony Brook University did, along with a machine-checked TLAPS proof, updated in 2019:<p><a href="https://arxiv.org/abs/1606.01387" rel="nofollow">https://arxiv.org/abs/1606.01387</a><p>¹Paxos consensus refines specs for consensus and voting. See the spec and accompanying material here: <a href="https://lamport.azurewebsites.net/tla/paxos-algorithm.html" rel="nofollow">https://lamport.azurewebsites.net/tla/paxos-algorithm.html</a>
Is there a mistake on the second diagram where prepare(1,e) is sent to Byzantium? Wonder if it should be prepare(1,a), since Athens hasn't learned the (1,e) value which only Ephesus and Delphi know about at that point?<p>Otherwise I think it's a pretty good description of Paxos. I like that it walks through a good number of failure scenarios and mentions interesting corner cases such as a lack of an explicit commit in the original papers and that reading also needs to run the full Paxos algorithm.
There are huge gaps between the scientific paper and how you actually operate and scale Paxos from an engineering point of view. See "Paxos Made Live - An Engineering Perspective" by Tushar Chandra et.al.<p>Paxos also has drawbacks like being slow or lacking liveness guarantees. Some alternatives are discussed in <a href="https://aws.amazon.com/builders-library/leader-election-in-distributed-systems/" rel="nofollow">https://aws.amazon.com/builders-library/leader-election-in-d...</a>
I prefer a simpler approach where only 100% read uptime is guaranteed: <a href="http://talk.binarytask.com/task?id=4662372011677848884" rel="nofollow">http://talk.binarytask.com/task?id=4662372011677848884</a><p>The way to do this is to asynconously write to all nodes in real-time, here you can try it out: <a href="http://root.rupy.se" rel="nofollow">http://root.rupy.se</a><p>Go to <a href="http://root.rupy.se/node?make&info" rel="nofollow">http://root.rupy.se/node?make&info</a> and press Make and you'll se the replication in real-time! (although it's too fast so you'll only see it as being delivered in one go, if you wireshark it you'll see multiple HTTP chunks...)<p>The source is also here: <a href="https://github.com/tinspin/rupy" rel="nofollow">https://github.com/tinspin/rupy</a> in Root.java
Paxos has a reputation of being hard to understand. I was under the impression that Raft was now preferred for distributed consensus. The creators of the Raft algorithm explicitly had understandability as a goal when designing it and called out Paxos as being hard to understand.<p><a href="https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf" rel="nofollow">https://pdos.csail.mit.edu/6.824/papers/raft-extended.pdf</a>