TechEcho

10 comments

josh2600almost 4 years ago

Ok so having worked on distributed consensus a bunch here are a couple thoughts in no particular order:* In the real world, servers misbehave, like, a lot, so you have to deal with that fact. All assumptions about network robustness in particular will be proven wrong on a long enough timeline.* Leader election in a world without a robust network is an exercise in managing acceptable failure tolerances. Every application has a notion of acceptable failure rates or acceptable downtime. Discovering that is a non-trivial effort.Jeff Dean and others at Google famously came to the conclusion that it was ok for some parts of the Gmail service to be down for limited periods of time. Accepting that all self-healing/robust systems will eventually degrade and have to be restored is the first step in building something manageable. The AXD301 is the most robust system ever built by humans to my knowledge (I think it did 20 years of uptime in production). Most other systems will fail long before that. Managing systems as they fail is an art, particularly as all systems operate in a degraded state.In short, in a lab environment networks function really well. In the real world, it's a jungle. Plan accordingly.

评论 #27862324 未加载

评论 #27861761 未加载

benlivengoodalmost 4 years ago

Since the article mentions Google as the outlier preferring Paxos, I may be able to shed some light from a few years ago.The Paxos, paxosdb, and related libraries (despite the name, all are multi-paxos) are solid and integrated directly into a number of products (Borg, Chubby, CFS, Spanner, etc.). There are years of engineering effort and unit tests behind the core Paxos library and so it makes sense to keep using and improving it instead of going off to Raft. As far as I am aware the Google Paxos implementation predates Raft by quite a while.I think in general if most other people use Raft it's better for the community to have single, stable, and well-tested shared implementations for much the same reason it's good for Google to stick with Paxos.

评论 #27860464 未加载

评论 #27862356 未加载

jiryualmost 4 years ago

I gave the presentation in the linked article, here's a written version of the presentation: <a href="https://emptysqua.re/blog/paxos-vs-raft/" rel="nofollow">https://emptysqua.re/blog/paxos-vs-raft/</a>I hope that the "Paxos vs Raft" debate can die down, now that engineers are learning TLA+ and distributed systems more thoroughly. These days we can design new protocols and prove their correctness, instead of always relying on academics. For example, at MongoDB we considered adopting the reconfiguration protocol from Raft, but instead we designed our own and checked it with TLA+. See "Design and Verification of a Logless Dynamic Reconfiguration Protocol in MongoDB Replication" for details: <a href="https://arxiv.org/pdf/2102.11960.pdf" rel="nofollow">https://arxiv.org/pdf/2102.11960.pdf</a>

littlestymaaralmost 4 years ago

Heidi Howard, the first author of this paper did two videos about her paper:- A 10' short intro <a href="https://www.youtube.com/watch?v=JQss0uQUc6o" rel="nofollow">https://www.youtube.com/watch?v=JQss0uQUc6o</a>- A more in depth one : <a href="https://www.youtube.com/watch?v=0K6kt39wyH0" rel="nofollow">https://www.youtube.com/watch?v=0K6kt39wyH0</a>

brickbrdalmost 4 years ago

In practice, for the systems where I built a replication system from the ground up, once you factor in all the performance, scale, storage layer and networking implications, this Paxos vs. Raft thing is largely a theoretical discussion.Basic paxos, is well, too basic and people mostly run modifications of this to get higher throughput and better latencies. After those modifications, it does not look very different from Raft with modifications applied for storage integration and so on.

评论 #27859631 未加载

评论 #27859516 未加载

butterisgoodalmost 4 years ago

VR for the win... (no not that VR) <a href="http://pmg.csail.mit.edu/papers/vr-revisited.pdf" rel="nofollow">http://pmg.csail.mit.edu/papers/vr-revisited.pdf</a>Ok, maybe not for the win, but it's worth a look. I'm actually fairly certain one of the Paxos implementations I've worked with and used is really more of a VR bend to Paxos anyway.

评论 #27862529 未加载

hutrdvnjalmost 4 years ago

Can someone provide a short description of the differences between Paxos and Raft?

评论 #27859976 未加载

评论 #27860020 未加载

lowbloodsugaralmost 4 years ago

Problem: "Raft protocol described and analyzed in English has problems." Solution: "Here is a modification to the protocol, described in English, that does not have such a problem."Seems like there is a common failure mode of "describing distributed protocols in plain english and thinking this is a proof"?The actual problem here is not "There was a problem in the Raft protocol, and I figured it out and provided a work around". The actual problem here is "Reasonably experienced software engineers reviewed the specification and didn't see any problems." This actual problem has not been addressed by the article.

评论 #27859289 未加载

tschellenbachalmost 4 years ago

Stream's consensus algorithms are all Raft based, the Go Raft ecosystem is very solid. We did end up forking some of the libraries, but nothing major.

aneutronalmost 4 years ago

It's not possible do to so, since it would be done asynchronously.I'm sorry, I'll see myself out.

评论 #27861174 未加载

10 comments

josh2600almost 4 years ago

评论 #27862324 未加载

评论 #27861761 未加载

benlivengoodalmost 4 years ago

评论 #27860464 未加载

评论 #27862356 未加载

jiryualmost 4 years ago

littlestymaaralmost 4 years ago

brickbrdalmost 4 years ago

评论 #27859631 未加载

评论 #27859516 未加载

butterisgoodalmost 4 years ago

评论 #27862529 未加载

hutrdvnjalmost 4 years ago

Can someone provide a short description of the differences between Paxos and Raft?

评论 #27859976 未加载

评论 #27860020 未加载

lowbloodsugaralmost 4 years ago

评论 #27859289 未加载

tschellenbachalmost 4 years ago

Stream's consensus algorithms are all Raft based, the Go Raft ecosystem is very solid. We did end up forking some of the libraries, but nothing major.

aneutronalmost 4 years ago

It's not possible do to so, since it would be done asynchronously.I'm sorry, I'll see myself out.

评论 #27861174 未加载

Paxos vs. Raft: Have we reached consensus on distributed consensus?

10 comments

Paxos vs. Raft: Have we reached consensus on distributed consensus?

10 comments