Congrats on the community blog launch! <3<p>>We would love to see someone in the etcd community integrate the etcd Jepsen tests directly into the existing etcd release pipeline.<p>I consider this to a be an issue of higher priority than any of the bugs they just found, because this will ensure preventable bugs don't crop up in the future. It's shocking to me that Jepson goes through all the effort and than very few projects build a permanent pipeline for it. It's debatable these bugs would've existed if a Jepson pipeline had been consistently in use from the 0.4.x days. I'm sure it's no simple task, but neither is a lot of the existing testing infrastructure for etcd.
> This is, apparently, not correct. Asking for revision 0 causes etcd to stream updates beginning with whatever revision the server has now, plus one, rather than the first revision. Asking for revision 1 yields all changes. This behavior was not documented.<p>I had worked on an alternative etcd impl and had to workaround this assumption as well. It is technically documented in the proto[0], and numeric 0 is of course "unset" or "default" in proto3 land.<p>One thing I would like to see tested is nested transactions where one txn child mutates something then the second sibling txn child uses that something. I've found that implementation is lacking.<p>0 - <a href="https://github.com/etcd-io/etcd/blob/53f15caf73b9285d6043009fa64c925d5a8f573c/etcdserver/etcdserverpb/rpc.proto#L684" rel="nofollow">https://github.com/etcd-io/etcd/blob/53f15caf73b9285d6043009...</a>
There's also <a href="https://etcd.io/blog/jepsen-343-results/" rel="nofollow">https://etcd.io/blog/jepsen-343-results/</a>, via <a href="https://news.ycombinator.com/item?id=22191925" rel="nofollow">https://news.ycombinator.com/item?id=22191925</a>.
<i>This is, apparently, not correct. Asking for revision 0 causes etcd to stream updates beginning with whatever revision the server has now, plus one, rather than the first revision. Asking for revision 1 yields all changes. This behavior was not documented.</i><p>Whoops, looks suspiciously like someone tested the revision integer for truthiness to see if something was passed.
The corresponding jepsen post: <a href="http://jepsen.io/analyses/etcd-3.4.3" rel="nofollow">http://jepsen.io/analyses/etcd-3.4.3</a><p>I do no work at all in this area, but i love these reports. They're examples of well-written, clear, "engineer-mind" reports that we would all do well to emulate.
I mostly see etcd being used to store metadata & configuration data for distributed systems.<p>Can etcd be also used as a general distributed database like FoundationDB or ScyllaDB? If so how does it compare to those other optiions?
How many nodes can etcd handle without having noticeable decay in performance? their FAQ says 7 but did somebody use it in some other distributed app other than k8s with more nodes? Assume that most of the operations are get and watch (i.e. write/read <<< 1.0), how big of a cluster in terms of number nodes can we scale up to?
For the impatient: “The etcd key-value store is a distributed database based on the Raft consensus algorithm. In our 2014 analysis, we found that etcd 0.4.1 exhibited stale reads by default. We returned to etcd, now at version 3.4.3, to investigate its safety properties in detail. We found that key-value operations appear to be strict serializable, and that watches deliver every change to a key in order. However, etcd locks are fundamentally unsafe, and those risks were exacerbated by a bug which failed to check lease validity after waiting for a lock.”