Cassandra developer here.<p>Lots of comments here about how Cassandra is AP so of course you get inconsistent (non-serializable) results.<p>This is true, to a point. I'm firmly convinced that AP is a better way to build distributed systems for fault tolerance, performance, and simplicity. But it's incredibly useful to be able to "opt in" to CP for pieces of the application as needed. That's what Cassandra's lightweight transactions (LWT) are for, and that's what the authors of this piece used.<p>However! Fundamentally, mixing serializable (LWT) and non-serializable (plain UPDATE) ops will produce unpredictable results and that's what bit them here.<p>Basically the same as if you marked half the accesses to a concurrently-updated Java variable with "synchronized" and left it off of the other half as an "optimization."<p>Don't take shortcuts and you won't get burned.
As a long time Cassandra user its easy to forget that some of Cassandra's semantics will be surprising to new users. That being said, if you are considering adopting an AP database it really is important for you to know the details about how write conflicts get resolved. This is perhaps the biggest difference between Cassandra other databases like Riak and ought to be part of your decision making process instead of a surprise you run into later.<p>That being said, using Cassandra for distributed locks is a terrible idea. I can't think of any way in which Cassandra would be better than using {Zookeeper,etcd,consul}. Trying to force a database to do something it really isn't designed for will almost always lead to disappointment (and often resentment) of said database.
This:<p><pre><code> INSERT INTO locks (id, lock, revision)
VALUES ('Tom', true, 1)
IF NOT EXISTS USING TTL 20;
</code></pre>
looks like a race condition. The same problem comes up in SQL databases - you can't lock a row that doesn't exist yet. If you write, in SQL:<p><pre><code> BEGIN TRANSACTION
SELECT FROM locks WHERE id = "Tom" AND lock = true AND revision = 1;
-- if no records returned
INSERT INTO LOCKS locks (id, lock, revision) VALUES ('Tom', true, 1)
COMMIT
</code></pre>
you have a race condition. If two threads make that identical request near-simultaneously, both get a no-find from the select, and both do the INSERT.
SELECT doesn't lock rows that don't exist.<p>The usual solution in SQL is to use UNIQUE indices which will cause an INSERT to fail if the record about to be inserted already exists.<p>I ran into this reassembling SMS message fragments, where I wanted to detect that all the parts had come in. The right answer was to do an INSERT for each new fragment, then COMMIT, then do a SELECT to see if all the fragments of a message were in. Doing the SELECT first produced a race condition.
And there's another surprise waiting to be discovered. The execution of a LWT is not guaranteed to return applied/not-applied response [1]. It can raise a WriteTimeout exception that means "I don't know if applied". It looks like in that case it can be worked around by inserting a UUID and in case of a WriteTimeout reading the UUID using SERIAL consistency and checking if it's the inserted UUID. But generally this limitation of LWTs makes implementing some algorithms impossible, e.g. you can't implement a 100% reliable counter.<p>[1] <a href="https://issues.apache.org/jira/browse/CASSANDRA-9328" rel="nofollow">https://issues.apache.org/jira/browse/CASSANDRA-9328</a>
I railed against CQL right from the start and it's precisely because of this kind of thing. Imitating SQL has the side effect of setting certain expectations and drags a certain mental model along with it.
If you're not writing purely immutable data or can't 100% guarantee a serialized reader/writer, then you're just looking for trouble with Cassandra.
Shouldn't Cassandra be using Lamport timestamps or even vector clocks there? Relying on timer and its resolution sounds strange for a database, especially a distributed one.
Author here.<p>Great discussion around the CAP theorem but it misses the point. AP vs CP / Cassandra being AP is not relevant to this particular problem:<p>1) This is not a distributed systems corner case. You will run into this if you are running Cassandra on a single node. A node should be able to guarantee consistency internally during normal operation. If it is not able to do that, there is something wrong with the system.<p>2) This is a case where queries are being send from the same process/thread and go to exactly the same nodes. Attach a simple, monotonically increasing query counter to each call and you can easily serialize it on the other side.
I see this as a basic misunderstanding of how LWT works. If you want to ensure serializable operations, then you need to use LWT with preconditions that ensure serializable operations.<p>Even better, stop trying to emulate the old and tired distributed lock methods that have been proven over and over again to be insufficient.
I guess I'm old or just not hip (most likely both) but I had to google WAT (I know WTF but WAT ... never seen it).<p>Even now I'm still not sure but I presume WAT = what!
Hazelcast has a distributed lock (see <a href="http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#lock" rel="nofollow">http://docs.hazelcast.org/docs/3.7/manual/html-single/index....</a>) and I've used for more than a year to synchronize jobs across a cluster.
With the Oracle/PostgreSQL, readers never wait for writers and writers never wait for readers <a href="http://philip.greenspun.com/sql/your-own-rdbms.html" rel="nofollow">http://philip.greenspun.com/sql/your-own-rdbms.html</a>
using underlying locking mechanism
<a href="http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc.html#flocking" rel="nofollow">http://www.beej.us/guide/bgipc/output/html/singlepage/bgipc....</a>
Its' better to implement <a href="https://en.wikipedia.org/wiki/Priority_inversion" rel="nofollow">https://en.wikipedia.org/wiki/Priority_inversion</a> in all <a href="https://en.wikipedia.org/wiki/NoSQL#Types_and_examples_of_NoSQL_databases" rel="nofollow">https://en.wikipedia.org/wiki/NoSQL#Types_and_examples_of_No...</a>
It feels like it is 2013 all over again: <a href="https://aphyr.com/posts/294-jepsen-cassandra" rel="nofollow">https://aphyr.com/posts/294-jepsen-cassandra</a>