PingCAP CTO here, I really enjoy the cooperation with Kyle this two months. I want to talk a little bit about my feelings.
Before the "official" Jepsen test, TiDB has already written its own Jepsen test cases for a long time. Our past Jepsen test code has become a good starting point in this test. In addition, as a result, the test case written by the database developer will inevitably have some blind spots. This is even if you have your own Jepsen test passed, I still recommend you to do the official test by Kyle's team.<p>In the process of testing, we have no way to influence what Kyle tests. Usually, only Kyle will tell us when abnormal behaviour occurs. Our main job is to analyze the cause and then fix it.
Here are some tips for recommending friends who want to do Jepsen test in the future:<p>1. Carefully check your documentation before testing to make sure it is consistent with your database implementation.<p>2. It is best for someone on the team to understand and build your own Jepsen test environment internally and test parallelly with the Kyle team.<p>3. Simplified deployment process is very helpful<p>4. Be sure the version of your DB before testing. Usually, you can select one when starting the test and select one before the end.<p>I really enjoy working with Kyle. I would say Kyle has a good sense of humor :), and I hopes to have another chance to cooperate Kyle and his team in the future! Thank you.
@aphyr I love reading these discussions about Jepsen! How would an individual contributor get involved with Jepsen and contribute during free time? Is most of the work related to adding public Jepsen tests for each database or is there work available with the framework as well? I'd love to learn more about Jepsen and distributed systems testing!
This is great, I've been evaluating CockroachDB for a high-write data situation, and TiDB came up as an alternative. CRDB having a Jepsen test and TiDB not having one made the decision pretty favored to CRDB. Looks like I can really give both a look now.
Most Jepsen reports seem to find glaring flaws, including subsystems that add complexity while reducing correctness, like the “auto-retry” features here, or novel consensus schemes that just aren’t airtight and therefore technically don’t work.
This is a pretty disappointing result.<p>I've noted TiDB's weak transaction isolation guarantees before, which were not well documented. In particular, they claimed phantom reads were not possible, which this test shows isn't true.<p>That said, it looks like these issues aren't fundamental design problems and can be fixed.
This is great news. The only blocker is lack of FK constraints. I know this is a performance issue, but it's a blocker for those expecting it to behave like MySQL. We have a lot of bad queries (unfortunately) that rely on constraints for correctness. It's too dangerous to do any migration that eliminates them without at least a reduced performance setting to warn about violations that we can run for a month or two before committing 100%.<p>It pains me to say this, but strong ties to China company-wide will likely mean my org never adopts this, fk constraints aside. A victim of the trade war? Yes. But many of our clients are strictly "no Chinese vendors for libraries or dependencies", even if the data is on our own servers.<p>I'm not sure if anything can be done at this point, but building an isolated US based org would help immensely with adoption. The cap table also being mostly Chinese investors is enough to stop most US companies at the DD phase
@aphyr:<p>"We have not evaluated filesystem or disk faults with TiDB, and cannot speak to crash recovery. Nor have we tested dynamic membership changes. Both might be fruitful avenues of investigation for future research."<p>How do you choose what components to test once you get past obvious stuff like the highest claimed isolation level?
See the companion blog from PingCAP:
<a href="https://pingcap.com/blog/tidb-passes-jepsen-test-for-snapshot-isolation-and-single-key-linearizability/" rel="nofollow">https://pingcap.com/blog/tidb-passes-jepsen-test-for-snapsho...</a>
Hi @aphyr. I'm a great fan of your work with Jepsen although I know very little about the fault tolerance of distributed systems. Are there any resources you would recommend on the subject? I am an application developer so I don't see myself writing a database in the future. Still it would be great to learn about the concepts.<p>Cheers!