Obligatory shout-out to Apache Curator: <a href="http://curator.apache.org/" rel="nofollow">http://curator.apache.org/</a><p>Curator implements a bunch of algorithms often implemented on top of ZooKeeper. I like to think of ZooKeeper as the distributed systems equivalent of peer-reviewed implementations of cryptographic primitives. Curator is a like a whole cryptographic protocol / cryptosystem. In both cases: don't implement your own!
I use ZooKeeper in production for snitch.io.<p>There are some interesting new alternatives such as etcd / serf/consul - but at the time ZooKeeper had the best track record (under Jepsen analysis). Things might have changed since then.<p>Aphyr has done a bunch of analysis of these systems part of his Jepsen tool: <a href="http://aphyr.com/tags/jepsen" rel="nofollow">http://aphyr.com/tags/jepsen</a> and <a href="http://aphyr.com/posts/291-call-me-maybe-zookeepe" rel="nofollow">http://aphyr.com/posts/291-call-me-maybe-zookeepe</a><p>If you are going to use ZooKeeper I strongly suggest looking at both Apache Curator and Netflix Exhibitor (they are complimentary).<p>The examples bundled with ZK don't handle all errors/edge cases...<p>Curator is a library of common patterns available to use mostly out of the box.<p>Exhibitor is a ZooKeeper "aware" supervisor system: <a href="https://github.com/Netflix/exhibitor" rel="nofollow">https://github.com/Netflix/exhibitor</a><p>Also always remember your ensemble should have an odd number of nodes (3,5,7)
If you enjoyed this, I highly recommend Mikito Takada's "Distributed systems for fun and profit" <a href="http://book.mixu.net/distsys" rel="nofollow">http://book.mixu.net/distsys</a><p>The "Partition-tolerant consensus algorithms: Paxos, Raft, ZAB" section is relevant, along with the "Further Reading" which follows it.
You may also be interested in <a href="http://aphyr.com/posts/291-call-me-maybe-zookeeper" rel="nofollow">http://aphyr.com/posts/291-call-me-maybe-zookeeper</a>
I've used ZooKeeper not only as a service registry but also as a fairly small message queue - I wanted to be sure that my message will be delivered at least once, and thanks to Kazoo's (Python ZK library) LockingQueue recipe I was able to get what I want really easily with all the benefits from ZK's clustering nature.