TechEcho

9 comments

ww520almost 12 years ago

This is very cool. Simple enough to reason the invariants easily. I guess one of the key insights is that each data has a canonical server owner which enforces the consistency of the writes of the data at a single place.I have one question regarding the determination of the latest version of a set of peer data when overlapping transactions occurred.Suppose initially s1/x=0 at server s1 and s2/y=0 at server s2. Client1 updates s1/x=10 and s2/y=10 at transaction v1, and at the same time client2 updates s1/x=20 and s2/y=20 at v2. Suppose the clients contact the servers in different order and the update messages arrive at the servers in reverse order, such that s1's pending queue is [x.v1=10, x.v2=20] and s2's pending queue is [y.v1=20, y.v2=10].After client1 and client2 send the make-good command, s1's good queue is [x.v1=10, x.v2=20] and s2's good queue is [y.v2=20, y.v1=10]. When a third client3 tries to read the latest value of x or y, what is the latest value of its peer data? It looks like depending which data client3 starts with, it would get a different version of the peer data? Like starts with s1/x got s1/x.v2 == 20 and s2/y.v2 == 20, while starts with s2/y.v1 == 10 and s1/x.v1 == 10.Am I missing something or this is the semantic in determining the latest values of peer data?

评论 #5784782 未加载

评论 #5784864 未加载

antirezalmost 12 years ago

Very cool and simple. The notion of "time" may be a bit misleading but actually the client just requires to generate an unguessable rand, think at it like getting a string from /dev/urandom, so actually time is completely not part of the protocol which is great (also avoiding to deal with client IDs can be nice in practice).

评论 #5782064 未加载

leefalmost 12 years ago

The algorithm does seem nicer than a 2PC protocol in that there is no need for a 2PC coordinator. By distributing the metadata as it does the clients and servers can figure out what has been committed and what hasn't. However it doesn't appear to directly include semantics for aborting transactions which is a pretty important part of a distributed transaction protocol.The paper admits the algorithm does not guarantee termination but I would have liked to see more details on the failure scenarios regardless (minor details in footnote 3). It's not clear to me what writers see (if anything) when a write fails.The paper does talk about how non-overlapping transactions won't block each other (which is nice but not a solution) and how one could add the ability to abort and trigger a cleanup by the use of a good (user supplied) failure detection module. But having a reliable node failure monitor that can react fast enough to ensure availability is really the hard part.Would love to see more on aborting transactions next.

评论 #5782086 未加载

espeedalmost 12 years ago

This model is somewhat similar to how Datomic models transactions -- Datomic uses immutable datoms and includes a transaction ID with each datom: "A datom consists of an entity, attribute, value and transaction (time)" (<a href="http://www.datomic.com/rationale.html" rel="nofollow">http://www.datomic.com/rationale.html</a>).However, Datomic's design is based on a single, centralized transactor that pushes out all the new transaction information to an index that is distributed to all its clients (peers), whereas NBTA would enable distributed transactors.See "The Datomic Data Model" <a href="http://www.datomic.com/rationale.html#DataModel" rel="nofollow">http://www.datomic.com/rationale.html#DataModel</a>From Datomic's FAQ <a href="http://www.datomic.com/faq.html" rel="nofollow">http://www.datomic.com/faq.html</a> ...How does Datomic provide ACID guarantees?The transactor serializes all transactions, and each transaction runs against a stable view of the database, and succeed or fail in their entirety. Transactions are not acknowledged until after they are logged in storage. Atomic read/modify/write operations can be performed by database functions, which run on the transactor within the transaction. Note that Datomic can provide ACID guarantees without utilizing read-transactions, nor read locks, due to the presentation to the query engine(s) of the database as an immutable value.---//---Rich Hickey on Datomic's design...Intro to Datomic (12 min) <a href="http://www.youtube.com/watch?v=RKcqYZZ9RDY#" rel="nofollow">http://www.youtube.com/watch?v=RKcqYZZ9RDY#</a>!The Design of Datomic (60 min) <a href="http://www.infoq.com/presentations/The-Design-of-Datomic" rel="nofollow">http://www.infoq.com/presentations/The-Design-of-Datomic</a>The Datomic Architecture and Data Model (46 min) <a href="https://vimeo.com/45136212" rel="nofollow">https://vimeo.com/45136212</a>

评论 #5787833 未加载

jeffcutsingeralmost 12 years ago

If you clear metadata, the following pathological case results in an inconsistent read:* x = 0 and y = 0, starting out in the "good" state with metadata cleared;* client A reads x with no metadata, then for whatever reason blocks or is delayed;* client B writes x = 1 and y = 1, completes the transaction, and the metadata is cleared;* client A reads y = 1.This shouldn't be an issue unless you aggressively clear metadata and have a long-running client.

评论 #5783762 未加载

chalmost 12 years ago

In the mode where a client is responsible to move writes from 'pending' to 'good', it wasn't clear to me what happens if the client dies before contacting all servers in the second pass?Does the data remain stable, or must some additional work be performed to correct the inconsistent state?

评论 #5782095 未加载

dscrdalmost 12 years ago

I think there's a typo under "good, pending and Invariants"? Isn't the caption supposed to be "not okay / (x=1 should be in pending or good)"

评论 #5784867 未加载

agilordalmost 12 years ago

In theory this idea looks OK. I'd be interested if someone implemented it on Riak or Cassandra? What are the conclusions?

yxhuvudalmost 12 years ago

Getting only 4.8% percent of throughput for reads doesn't seem stellar. Why is this so bad?

评论 #5785334 未加载

9 comments

ww520almost 12 years ago

评论 #5784782 未加载

评论 #5784864 未加载

antirezalmost 12 years ago

评论 #5782064 未加载

leefalmost 12 years ago

评论 #5782086 未加载

espeedalmost 12 years ago

评论 #5787833 未加载

jeffcutsingeralmost 12 years ago

评论 #5783762 未加载

chalmost 12 years ago

评论 #5782095 未加载

dscrdalmost 12 years ago

I think there's a typo under "good, pending and Invariants"? Isn't the caption supposed to be "not okay / (x=1 should be in pending or good)"

评论 #5784867 未加载

agilordalmost 12 years ago

In theory this idea looks OK. I'd be interested if someone implemented it on Riak or Cassandra? What are the conclusions?

yxhuvudalmost 12 years ago

Getting only 4.8% percent of throughput for reads doesn't seem stellar. Why is this so bad?

评论 #5785334 未加载

Non-blocking transactional atomicity

9 comments

Non-blocking transactional atomicity

9 comments