I'm including the Gryadka author's rebuttal here, for completeness:<p>Thank you for the analysis of my post but it seems that you didn't get it correctly. Even to read a value you have to execute the full cycle (prepare, accept) of consensus (in the case of the stable leader we can skip prepare), so when you read the nil value the state of the system will be:<p>A: (value=foo ballot=1 promised=2)
B: (value=nil ballot=2 promised=2)
C: (value=nil ballot=2 promised=2)<p>Not the one you mentioned in the post:<p>A: (value=foo ballot=1 promised=2)
B: (value=nil ballot=0 promised=2)
C: (value=nil ballot=0 promised=2)<p>So the counter example is incorrect.<p>I proved the algorithm mathematically by hand and used very aggressive property based testing with fault injections so I'm pretty confident in its correctness.
This is a very good analysis.<p>I think it makes a slightly stronger argument about MultiPaxos than it needs to. There are other correct ways to use single-decree Paxos, as long as you recognize it's limitations. The true limitation is "use every register once", and a log is the gold standard way to do that. Another correct pattern could be a set-only K/V store, or a store for monotonic finite state machine states (at the cost of O(N) rounds for every contender). One real-world example is 2PC coordination, where the state machine is very simple, and can be modeled as an ordered pair of three registers.
<a href="http://www.allreadable.com/5b354QWp" rel="nofollow">http://www.allreadable.com/5b354QWp</a><p>"every consensus protocol out there
or every fully distributed consensus protocol
is either Paxos or Paxos with cruft or broken" - Mike Burrows
I haven't done any in depth analysis of gryadka but I think the premise of the argument here may be wrong. Even if a particular value isn't accepted by a majority of nodes, that doesn't necessarily make it a dirty read. As long as anyone participating in the algorithm sees the history as if it was committed right before the cas result is committed, it could still be linearizable. I would need to create a formal model to be sure whether it is correct or not, but don't just assume it's wrong because it isn't paxos (which it isn't and it shouldn't advertise like it is).
I think the most interesting thing the article states is the following question (and proposed answer):<p>Is it possible to get compare-and-swap without the log?<p>TL;DR: I don’t think so.<p>Riak Ensemble is such a system that utilizes single-decree Paxos in order to try to achieve CAS: <a href="https://github.com/basho/riak_ensemble" rel="nofollow">https://github.com/basho/riak_ensemble</a><p>If the only the distinguished leader is allowed to publish a proposal for value change, then this allows you to preserve the CAS invariant. Unfortunately, this comes at the sacrifice of liveness. If you then have a second Paxos group to elect the leader, then this removes the liveness issue. On view change, you have to contact a quorum of nodes, and re-propose a new epoch in order to avoid issues.<p>I'm curious if someone can poke a hole in riak_ensemble's algorithm.
It would have been nicer to contact the authors of the algorithm before making these (apparently incorrect) claims via blog post. It's just sensationalistic.
I know that consensus is probably the cleanest way to guarantee very strict correctness, but I'm kind of not convinced that synchronous replication is the ideal solution for HA. I have yet to see clear evidence that teams at less-than-Google scale aren't mostly getting by with semisynchronous replication, or even just asynchronous replication.<p>Always running an ensemble of three or five nodes for each shard seems to be pretty overkill (and expensive, both in terms of money and latency), especially if it's mostly just a way to do automated failovers. I sometimes wonder if there's a good-enough cheaper semisync alternative.
Everybody upvoting this without any discussion.<p>I suppose "It's not Paxos, so it's probably wrong" is beyond discussion (not that I disagree, of course).