TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Jepsen: Testing the Partition Tolerance of PostgreSQL, Redis, MongoDB and Riak

76 pointsby sethevalmost 12 years ago

4 comments

mjbalmost 12 years ago
The exchange between Antirez and aphyr following the post about Redis sentinel is a fascinating comparison between two engineering approaches. Antirez makes a qualitative argument (<a href="http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56" rel="nofollow">http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56</a>, especially <a href="http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56#comment-910996445" rel="nofollow">http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56#comment-910996445</a>) about the behavior of the system in some &#x27;real world&#x27; where complex network partitions are rare. On the other hand, aphyr made a much more theoretically sound argument (including using TLA+ to demonstrate the validity of his argumement) in his post (<a href="http:&#x2F;&#x2F;aphyr.com&#x2F;posts&#x2F;287-asynchronous-replication-with-failover" rel="nofollow">http:&#x2F;&#x2F;aphyr.com&#x2F;posts&#x2F;287-asynchronous-replication-with-fai...</a>).<p>Despite having a huge amount of respect for Antirez and Redis, I strongly believe that the approach aphyr took is the one we are going to need as we build larger and more complex systems on unreliable infrastructure. Our engineering intuition, as excellent as it may be for single-node systems, almost always fails us with distributed systems. To get around this, we need to replace intuition. The tools that aphyr uses, such as TLA+ and carefully crafted counterexamples and diagrams, are an extremely good start in that direction. Getting a computer (in this case TLA+&#x27;s model checker TLC) to exhaustively test a design specification is very powerful. Comparing those results to the ones that we expected is even more powerful.<p>The comment made by Metaxis (<a href="http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56#comment-905001533" rel="nofollow">http:&#x2F;&#x2F;antirez.com&#x2F;news&#x2F;56#comment-905001533</a>) on Antirez&#x27;s second reply is very good. Especially:<p>&gt; I think your attempt to differentiate formal correctness and real world operations is deeply flawed and amounts to asserting anecdote - that what you have observed to be common makes for good design assumptions and better trade off decisions.<p>&gt; Allow me to counter: Real world operations will inevitably tend to approach formal correctness in terms of observed failure modes. In other words, over time, you are more and more likely to see edge cases and freak occurrences that are predicted in theory but happen rarely in practice.<p>This closely matches my own experience. Just because I don&#x27;t believe a network can behave in a particular way, doesn&#x27;t mean it won&#x27;t. The real world is full of complex network partitions and Byzantine failures, and our systems need to be safe when they happen
评论 #5913610 未加载
packetbeatsalmost 12 years ago
Fantastic in depth article. In my opinion, the tests he perform show not so much drawbacks of the tested systems, but more the fact that defining them in terms of the CAP theorem can be misleading. For example, a CP system should, in case of a partition, wait until the partition is resolved, potentially forever. This is not useful in practice, where almost always some timeout is used. This is why, if I&#x27;m interpreting the results correctly, not even a Postgres running on a single node can claim to be CP.
评论 #5912704 未加载
mdellabittaalmost 12 years ago
I think (╯°□°)╯︵ ┻━┻ and ヽ(´ー`)ノ need to be declared as constants in all logging libraries and used accordingly.
评论 #5915418 未加载
Roboprogalmost 12 years ago
Interesting, but needs a summary to help conceptualize the details which it slogs through.
评论 #5912521 未加载
评论 #5912604 未加载
评论 #5913101 未加载