Author here. I made this visualization over a decade ago and I'm glad it's still useful for folks! Let me know if you have any questions.<p>I've also been trying on-and-off again some different techniques for doing the visualization as I'd like to do more of these. I'm currently looking at trying to make it work with Remotion[1]. The JavaScript version I did for Raft was time intensive and I ended up having to write an entire (albeit terrible) implementation of Raft to even get it to work. lol.<p>[1] <a href="https://www.remotion.dev/" rel="nofollow">https://www.remotion.dev/</a>
I've only heard about Raft Consensus algorithm thrown around in a few GitHub repos/HN comments but never got a chance to really know it.<p>This webpage cleared some long-standing doubts about what distributed computing means, what a consensus algorithm is and what his Raft thing is.<p>Kudos to the developer. You got a newbie interested in the field!
This is the first I have heard of Raft, but enjoyed the animations and ideas. I work on multi-node radio communications for ag automation. I had two questions after watching this:<p>- Is Raft alone in this space, or are there other popular algorithms/libraries that fill the same space?<p>- What happens when the node count gets larger than a handful? What happens when you hit hundreds or even thousands of nodes, that are trying to achieve consensus? In particular, the part where all of the nodes respond (semi) simultaneously to a broadcast node. In a radio spectrum world, that would be a disaster. N:1 communication slots are choke points for timely communication.
Is it a weakness to only commit on majority consensus? I'm thinking of a very unstable global network, where partitions are happening everywhere. In that scenario, only one cluster can reach consensus (if you're lucky). If the partitions are such that no cluster has majority, nothing can proceed.<p>Is there a better way to proceed with tentative consensus, until a majority cluster can be realized, and then have a conflict resolution strategy? People operate this way.
Great timing. I'm part of a German podcast on fundamentals of computing [1], and we just recorded an episode on Distributed Systems that discusses Raft as an example. We will probably be adding an addendum to link to this.<p>[1] <a href="https://www.schluesseltechnologie-podcast.de" rel="nofollow">https://www.schluesseltechnologie-podcast.de</a>
On a related note, I’ve found <a href="https://martinfowler.com/articles/patterns-of-distributed-systems/" rel="nofollow">https://martinfowler.com/articles/patterns-of-distributed-sy...</a> to be quite instructive in understanding distributed systems in general.
More generally, the Raft page on Github lists some good resources on that subject (including that really good animation):<p><a href="https://raft.github.io/" rel="nofollow">https://raft.github.io/</a>
I've had a surprisingly hard time finding a bare-bones Raft implementation in Java purely for leader election.<p>The same hunt also surprised me that there is no common way to do leader election among pods in Kubernetes.
This is just ridiculously good. I am normally a word learner but lately my mind has been going a little bit more visual. This was very very well thought out and helped me enormously.
Previous discussion (in 2020): <a href="https://news.ycombinator.com/item?id=25326645" rel="nofollow">https://news.ycombinator.com/item?id=25326645</a><p>Also, I personally think the current blockchain literature is much more intuitive and easier to follow, for learning about consensus. The Byzantine case isn't really that different than the crash case if we assume cryptography. On the other hand, Raft is a spiderweb of a protocol, very easy to get wrong.
I am working on MIT 6.824[0] by myself as a side project. This visualization is very very helpful at the beginning just so that I can build up the right mental model and understand how components interact with each others.<p>[0]: <a href="https://nil.csail.mit.edu/6.824/2020/schedule.html" rel="nofollow">https://nil.csail.mit.edu/6.824/2020/schedule.html</a>
Excellent!<p>A couple of questions:<p>1) In the case of a network partition, the client that is currently connected to the leader, do they get notified that there's a partition, or that the cluster is not in a healthy situation?<p>2) If a client writes to the partition that will get rolled back, and all their transactions get rolled back after the partition heals, do they get notified that their data was rolled back?
So, I am imagining making a distributed storage (maybe some global database)<p>If implement raft on top of S3 I can kinda see this working. Is there a sensible "file system on top of S3 like storage" out there already?
I’ve worked on two implementations of raft, and two other multi paxos implementations and still think single decree paxos + 2PC is probably a better idea just super hard to implement.
I ran into this while setting up Hashicorp Vault a year or two ago. It was good at helping me understand what's happening, but I don't particularly like raft. I want to be able to recover from one server, and I don't want to have to wait for a majority on every transaction should I add many servers. I know it's an impossible problem to solve generally, but I think in many situations an alert saying some specific data had a conflict and might not have been resolved correctly is a much better outcome than an outage.
Unsolicited feedback: use fewer text-appear animations, and allow people to skip through stuff. I've spent a full minute clicking next next and still haven't seen a visualization aside from text slides loading slowly with animations. It's like a long YouTube ad that you cannot skip.
Ugh... how slow the animations are... I read much much faster than that but it feels like playing through an old JRPG that doesn't let you speed up the text playback.