This is good work, and the lock semantics and fencing token (epoch) make a lot of sense. I can't help but think that implementing java.util.concurrent.locks.Lock will turn out to be a liability. The problem here is that the code looks like a Java lock, but the semantics are entirely different with regards to failure. Specifically:<p>> While we’re on this subject, the same logic applies even to the primary FencedLock.lock() call: at the very next line of code in your program, you may no longer be holding the lock.<p>That's not, in most programmer's experience, how locks work. This behavior is necessary (at some level) to deal with partial failures and stalls of clients, but means that if you use this like Lock your code will be very wrong.<p>> Note the key message here: all external services must participate in the fencing-token protocol, with guaranteed linearizability, for the whole setup to uphold its invariants.<p>So this isn't really like a Java lock at all, and instead is a nice convenient way to build part of an epoch/view change implementation. That's useful, but in my mind the API they chose will reduce the likelihood that non-experts will use this correctly.
I think it's worth to add a few previous discussions about distributed locks.<p>Redlock is a distributed lock using Redis: <a href="https://redis.io/topics/distlock" rel="nofollow">https://redis.io/topics/distlock</a><p>Martin Kleppmann criticized Redlock and mentioned the fencing solution:
<a href="http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html" rel="nofollow">http://martin.kleppmann.com/2016/02/08/how-to-do-distributed...</a><p>Antirez disagrees with the analysis and the HN post has a good discussion:
<a href="https://news.ycombinator.com/item?id=11065933" rel="nofollow">https://news.ycombinator.com/item?id=11065933</a>
More accurately, these are leases, not locks in the traditional sense. The lease expires when the corresponding client session ends, which is detected by the absence of a heartbeat from the client.
There was also a follow up blog post again by Basri regarding how it's tested using Jepsen for those interested: <a href="https://hazelcast.com/blog/testing-the-cp-subsystem-with-jepsen/" rel="nofollow">https://hazelcast.com/blog/testing-the-cp-subsystem-with-jep...</a>
What happens when some of the quorum members are not available?<p>The fact that split brain is not allowed implies that liveness is given up for it.<p>More importantly, what can I possibly do in the scenario where I would like to obtain several locks at the same time?<p>Distributed lock frameworks usually imply there’s some sort of transaction reversal mechanism implied by the architecture.