This is very cool. Simple enough to reason the invariants easily. I guess one of the key insights is that each data has a canonical server owner which enforces the consistency of the writes of the data at a single place.<p>I have one question regarding the determination of the latest version of a set of peer data when overlapping transactions occurred.<p>Suppose initially s1/x=0 at server s1 and s2/y=0 at server s2. Client1 updates s1/x=10 and s2/y=10 at transaction v1, and at the same time client2 updates s1/x=20 and s2/y=20 at v2. Suppose the clients contact the servers in different order and the update messages arrive at the servers in reverse order, such that s1's pending queue is [x.v1=10, x.v2=20] and s2's pending queue is [y.v1=20, y.v2=10].<p>After client1 and client2 send the make-good command, s1's good queue is [x.v1=10, x.v2=20] and s2's good queue is [y.v2=20, y.v1=10]. When a third client3 tries to read the latest value of x or y, what is the latest value of its peer data? It looks like depending which data client3 starts with, it would get a different version of the peer data? Like starts with s1/x got s1/x.v2 == 20 and s2/y.v2 == 20, while starts with s2/y.v1 == 10 and s1/x.v1 == 10.<p>Am I missing something or this is the semantic in determining the latest values of peer data?
Very cool and simple. The notion of "time" may be a bit misleading but actually the client just requires to generate an unguessable rand, think at it like getting a string from /dev/urandom, so actually time is completely not part of the protocol which is great (also avoiding to deal with client IDs can be nice in practice).
The algorithm does seem nicer than a 2PC protocol in that there is no need for a 2PC coordinator. By distributing the metadata as it does the clients and servers can figure out what has been committed and what hasn't. However it doesn't appear to directly include semantics for aborting transactions which is a pretty important part of a distributed transaction protocol.<p>The paper admits the algorithm does not guarantee termination but I would have liked to see more details on the failure scenarios regardless (minor details in footnote 3). It's not clear to me what writers see (if anything) when a write fails.<p>The paper does talk about how non-overlapping transactions won't block each other (which is nice but not a solution) and how one could add the ability to abort and trigger a cleanup by the use of a good (user supplied) failure detection module. But having a reliable node failure monitor that can react fast enough to ensure availability is really the hard part.<p>Would love to see more on aborting transactions next.
This model is somewhat similar to how Datomic models transactions -- Datomic uses immutable datoms and includes a transaction ID with each datom: "A datom consists of an entity, attribute, value and transaction (time)" (<a href="http://www.datomic.com/rationale.html" rel="nofollow">http://www.datomic.com/rationale.html</a>).<p>However, Datomic's design is based on a single, centralized transactor that pushes out all the new transaction information to an index that is distributed to all its clients (peers), whereas NBTA would enable distributed transactors.<p>See "The Datomic Data Model"
<a href="http://www.datomic.com/rationale.html#DataModel" rel="nofollow">http://www.datomic.com/rationale.html#DataModel</a><p>From Datomic's FAQ <a href="http://www.datomic.com/faq.html" rel="nofollow">http://www.datomic.com/faq.html</a> ...<p>How does Datomic provide ACID guarantees?<p>The transactor serializes all transactions, and each transaction runs against a stable view of the database, and succeed or fail in their entirety. Transactions are not acknowledged until after they are logged in storage. Atomic read/modify/write operations can be performed by database functions, which run on the transactor within the transaction. Note that Datomic can provide ACID guarantees without utilizing read-transactions, nor read locks, due to the presentation to the query engine(s) of the database as an immutable value.<p>---//---<p>Rich Hickey on Datomic's design...<p>Intro to Datomic (12 min)
<a href="http://www.youtube.com/watch?v=RKcqYZZ9RDY#" rel="nofollow">http://www.youtube.com/watch?v=RKcqYZZ9RDY#</a>!<p>The Design of Datomic (60 min)
<a href="http://www.infoq.com/presentations/The-Design-of-Datomic" rel="nofollow">http://www.infoq.com/presentations/The-Design-of-Datomic</a><p>The Datomic Architecture and Data Model (46 min)
<a href="https://vimeo.com/45136212" rel="nofollow">https://vimeo.com/45136212</a>
If you clear metadata, the following pathological case results in an inconsistent read:<p>* x = 0 and y = 0, starting out in the "good" state with metadata cleared;<p>* client A reads x with no metadata, then for whatever reason blocks or is delayed;<p>* client B writes x = 1 and y = 1, completes the transaction, and the metadata is cleared;<p>* client A reads y = 1.<p>This shouldn't be an issue unless you aggressively clear metadata and have a long-running client.
In the mode where a client is responsible to move writes from 'pending' to 'good', it wasn't clear to me what happens if the client dies before contacting all servers in the second pass?<p>Does the data remain stable, or must some additional work be performed to correct the inconsistent state?