(Reposting my blog comment here)<p>"I mean primarily the ability to face a network partition and continue providing write service to all clients transitively connected to a majority of servers. Secondarily, I mean the ability to continue providing read-only service to all clients connected to any server."<p>A truly available system, in the sense of CAP, allows writes, not just reads, even under partition, even for clients not connected to the "majority" nodes. This leads inevitably to the need for conflict detection and resolution, and that whole "eventually consistent" thing. What you are describing is very useful, hence the existence of things like Chubby and ZK, but is most definitely not "available", per CAP.<p>Folks might also be interested in the classic paper on constructing locks from low level primitives like CAS, Herlihy's 'Wait-free Synchronization" <a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A01FADDC263D88A280509C0D8BF86C6F?doi=10.1.1.87.871&rep=rep1&type=pdf" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=A01...</a>
There are a number of other systems that allow this same approach of consistency and high availability. For example, Cassandra, which is freely available (as required by the poster), appears to be able to give you this behavior if you set ConsistencyLevel to QUORUM.<p>Clustrix, the company I work for, offers a full SQL data store with similar quorum semantics. However, it's not free.<p>Google Megastore allows similar consistency semantics (with its own data model) in a cross data center "cloud" fashion. It's also not free, but it would probably be suitable for some set of Heroku customers, particularly if they're already using Google App Engine.
"The only freely-available tool that attempts to provide both of these is zookeeper, but zookeeper is specialized, focused on locking and server management. We need a general-purpose tool that provides the same guarantees in a clean, well-designed package."<p>I think they dismissed Zookeeper too quickly without trying to understand it first. The zookeeper primitives (get/set nodes and their children) seems simpler and cleaner than doozer is client protocol, AFAICT. Zookeeper should scale better (especially for readers) than typical Paxos based systems as well.
Cool. Is the consensus algorithm straight-up Paxos, or are there modifications? The Paxos Made Live [1] paper from Google raised some interesting issues.<p>[1] <a href="http://labs.google.com/papers/paxos_made_live.html" rel="nofollow">http://labs.google.com/papers/paxos_made_live.html</a>