科技回声

13 条评论

neuronsguy大约 8 年前

I'm including the Gryadka author's rebuttal here, for completeness:Thank you for the analysis of my post but it seems that you didn't get it correctly. Even to read a value you have to execute the full cycle (prepare, accept) of consensus (in the case of the stable leader we can skip prepare), so when you read the nil value the state of the system will be:A: (value=foo ballot=1 promised=2) B: (value=nil ballot=2 promised=2) C: (value=nil ballot=2 promised=2)Not the one you mentioned in the post:A: (value=foo ballot=1 promised=2) B: (value=nil ballot=0 promised=2) C: (value=nil ballot=0 promised=2)So the counter example is incorrect.I proved the algorithm mathematically by hand and used very aggressive property based testing with fault injections so I'm pretty confident in its correctness.

评论 #13976207 未加载

mjb大约 8 年前

This is a very good analysis.I think it makes a slightly stronger argument about MultiPaxos than it needs to. There are other correct ways to use single-decree Paxos, as long as you recognize it's limitations. The true limitation is "use every register once", and a log is the gold standard way to do that. Another correct pattern could be a set-only K/V store, or a store for monotonic finite state machine states (at the cost of O(N) rounds for every contender). One real-world example is 2PC coordination, where the state machine is very simple, and can be modeled as an ordered pair of three registers.

评论 #13951115 未加载

评论 #13954154 未加载

RcouF1uZ4gsC大约 8 年前

<a href="http://www.allreadable.com/5b354QWp" rel="nofollow">http://www.allreadable.com/5b354QWp</a>"every consensus protocol out there or every fully distributed consensus protocol is either Paxos or Paxos with cruft or broken" - Mike Burrows

评论 #13951975 未加载

grogers大约 8 年前

I haven't done any in depth analysis of gryadka but I think the premise of the argument here may be wrong. Even if a particular value isn't accepted by a majority of nodes, that doesn't necessarily make it a dirty read. As long as anyone participating in the algorithm sees the history as if it was committed right before the cas result is committed, it could still be linearizable. I would need to create a formal model to be sure whether it is correct or not, but don't just assume it's wrong because it isn't paxos (which it isn't and it shouldn't advertise like it is).

评论 #13954431 未加载

评论 #13976231 未加载

sargun大约 8 年前

I think the most interesting thing the article states is the following question (and proposed answer):Is it possible to get compare-and-swap without the log?TL;DR: I don’t think so.Riak Ensemble is such a system that utilizes single-decree Paxos in order to try to achieve CAS: <a href="https://github.com/basho/riak_ensemble" rel="nofollow">https://github.com/basho/riak_ensemble</a>If the only the distinguished leader is allowed to publish a proposal for value change, then this allows you to preserve the CAS invariant. Unfortunately, this comes at the sacrifice of liveness. If you then have a second Paxos group to elect the leader, then this removes the liveness issue. On view change, you have to contact a quorum of nodes, and re-propose a new epoch in order to avoid issues.I'm curious if someone can poke a hole in riak_ensemble's algorithm.

norswap大约 8 年前

It would have been nicer to contact the authors of the algorithm before making these (apparently incorrect) claims via blog post. It's just sensationalistic.

gtrubetskoy大约 8 年前

For those wondering, "gryadka" is "bed" as in vegetable/garden/flower bed in Russian.

评论 #13951868 未加载

elvinyung大约 8 年前

I know that consensus is probably the cleanest way to guarantee very strict correctness, but I'm kind of not convinced that synchronous replication is the ideal solution for HA. I have yet to see clear evidence that teams at less-than-Google scale aren't mostly getting by with semisynchronous replication, or even just asynchronous replication.Always running an ensemble of three or five nodes for each shard seems to be pretty overkill (and expensive, both in terms of money and latency), especially if it's mostly just a way to do automated failovers. I sometimes wonder if there's a good-enough cheaper semisync alternative.

评论 #13953592 未加载

评论 #13953556 未加载

评论 #13953593 未加载

nsxwolf大约 8 年前

That article was like stepping foot on an alien world. I have never even heard of a single one of those things.

评论 #13952055 未加载

评论 #13951878 未加载

jolux大约 8 年前

What about Raft?

vittore大约 8 年前

Can we have TLA+ for Gryadka done by some?

评论 #13954177 未加载

whatnotests大约 8 年前

Everybody upvoting this without any discussion.I suppose "It's not Paxos, so it's probably wrong" is beyond discussion (not that I disagree, of course).

评论 #13952756 未加载

jancsika大约 8 年前

"If it’s not Paxos, it’s probably wrong."Is Bitcoin Paxos?

评论 #13951452 未加载

13 条评论

neuronsguy大约 8 年前

评论 #13976207 未加载

mjb大约 8 年前

评论 #13951115 未加载

评论 #13954154 未加载

RcouF1uZ4gsC大约 8 年前

评论 #13951975 未加载

grogers大约 8 年前

评论 #13954431 未加载

评论 #13976231 未加载

sargun大约 8 年前

norswap大约 8 年前

It would have been nicer to contact the authors of the algorithm before making these (apparently incorrect) claims via blog post. It's just sensationalistic.

gtrubetskoy大约 8 年前

For those wondering, "gryadka" is "bed" as in vegetable/garden/flower bed in Russian.

评论 #13951868 未加载

elvinyung大约 8 年前

评论 #13953592 未加载

评论 #13953556 未加载

评论 #13953593 未加载

nsxwolf大约 8 年前

That article was like stepping foot on an alien world. I have never even heard of a single one of those things.

评论 #13952055 未加载

评论 #13951878 未加载

jolux大约 8 年前

What about Raft?

vittore大约 8 年前

Can we have TLA+ for Gryadka done by some?

评论 #13954177 未加载

whatnotests大约 8 年前

Everybody upvoting this without any discussion.I suppose "It's not Paxos, so it's probably wrong" is beyond discussion (not that I disagree, of course).

评论 #13952756 未加载

jancsika大约 8 年前

"If it’s not Paxos, it’s probably wrong."Is Bitcoin Paxos?

评论 #13951452 未加载

Gryadka is not Paxos, so it's probably wrong

13 条评论

Gryadka is not Paxos, so it's probably wrong

13 条评论