Call me maybe: Redis redux

161 pointsby llambdaover 11 years ago

8 comments

antirezover 11 years ago

What Aphyr tested was not my "toy" example model, that was not even proposed as something to actually implement, but just to show that WAIT per se is not broken or ok, it is just a low level building block. The consistency achieved depends on the whole system, especially the failover procedure safety guarantees.What I proposed is a toy system as described here: <a href="https://gist.github.com/antirez/7901666" rel="nofollow">https://gist.github.com/antirez/7901666</a>It is a toy as it has a super strong coordinator that can partition away instances, that is never partitioned, that can reconfigure clients magically and so forth. Under the above assumptions the theoretical system is trivially capable of reaching linearizability I believe.Aphyr tested a different model, with an implementation that is not capable to even guarantee some weak assumption about the model (for example, the slave reset the replication offset to 0 when restarted), so I'm not sure what the result means.I could test the actual model I proposed even with Redis, but manually following the steps that I outlined in the Redis mailing list thread. The point was, if you can guarantee certain properties in the system, there is always the "transfer" of data and higher offsets to a majority of replicas, and the system becomes strongly consistent.The properties are hard to achieve in practice once you try to move the features from the mythical super strong coordinator into the actual system, and this is why, for example, Raft uses epochs and other mechanisms to guarantee both safety and liveness.Unfortunately the focus is in showing other people are wrong without even caring where the discussion is headed.--- EDIT ---Btw now that I read the full post carefully, Aphyr also cherry-picked parts of the thread to construct a story that does not exist, like if I was going to implement strong consistency into Redis based on the proposed toy system that was only useful to show that WAIT per se was not a system, but just a building block. Note that yesterday I wrote the opposite in my blog, that there is no interest in strong consistency in Redis Cluster.Very unfair IMHO... I read only the analysis part at first, and was thinking this was just a "let's check anyway this model with the current implementation".

评论 #6886773 未加载

rdtscover 11 years ago

> Ultimately I was hoping that antirez and other contributors might realize why their proposal for a custom replication protocol was unsafe nine months ago, and abandon it in favor of an established algorithm with a formal model and a peer-reviewed proof, but that hasn’t happened yet. Redis continues to accrete homegrown consensus and replication algorithms without even a cursory nod to formal analysis.That is kind of my feel. Redis is an outstanding product with a beautiful code base. This replication feature has been tough though. It is kind of due to external factors as I've mentioned in the previous post. Everyone and their cousin are talking about distributed databases, everyone likes CAP, CRDTs, Vector Clocks, Raft, Zookeeper and so on. It is hard to come up and say "Here I have made this custom replication protocol". Everyone stares and asks, "Hey where is your whitepaper or your partition tolerance tests?". 5-7 years ago, there would be only nods and approvals. The other aspect is this is about a database, so it is potentially toying and touching user's valuable data. If that gets lost either by a bug, mis-communication in docs, bad default, anything, it will not be taken lightly.In the end I think it is fine to have it as what it is, with the warnings and disclaimers that data could be lost and avoiding papering over or hiding issues.As an extra side note, simply put partition tolerance is hard. Net-splits are the devil of the distributed world. Some claim it doesn't exist or doesn't happens often. Others fear and tremble its name is mentioned. When it does happen it means having to resolve conflicts, throwing away user data, stopping killing your availability to stop some from accepting writes in order to provide consistency. This is a tough test (that Aphyr runs) and not very many databases fair well in it. But it is good these things are discussed.

评论 #6886367 未加载

评论 #6886500 未加载

derefrover 11 years ago

To rephrase antirez from the previous thread:People use Redis, in large part, for its time and space guarantees on data structure operations. (Without those, you may as well be using a serialized object store.) Strong consistency requires rollbacks; and the book-keeping necessary to do rollbacks throws away the time and space guarantees. So either you have Strong Consistency, or you have Redis, but you don't get both.But Redis Cluster is a compromise: something which is roughly good enough for most cases people actually use Redis for, while failing horribly at things Redis isn't used for anyway, and still providing Redis's time and space guarantees.Theorists balk, because there are obvious places where Redis Cluster falls down, and they can demonstrate this. Engineers shrug, because Redis isn't being used in their companies in such a way that those demonstrations are relevant to their problems.Most people who need Redis Cluster have already Greenspunned a Redis Cluster themselves, and they're already happily living with the compromise it entails. They'll gladly hand the support burden of writing cluster-management code upstream to antirez; it won't change any of the facts about the compromise.

评论 #6887014 未加载

justin66over 11 years ago

A person who found themselves sympathetic to the kind of hand-wavey feel-good explanation of things in yesterday's Redis thread might find this conclusion kind of snotty:> I wholeheartedly encourage antirez, myself, and every other distributed systems engineer: keep writing code, building features, solving problems–but please, please, use existing algorithms, or learn how to write a proof.That person should be sure to note these experimental results:> These results are catastrophic. In a partition which lasted for roughly 45% of the test, 45% of acknowledged writes were thrown away. To add insult to injury, Redis preserved all the failed writes in place of the successful ones.

评论 #6886892 未加载

mjsover 11 years ago

Note that antirez's reply (in the comments) begins with "thanks to Aphyr for spending the time to try stuff, but the model he tried here is not what I proposed..."

rcohover 11 years ago

As a gut check, if you're solving some replication problem and you'd consider using Paxos to solve the problem, be /very/ wary and reason extremely carefully about why your weaker solution will provide the same guarantees. Chances are, it will fail in certain cases of network outage or system failure.

评论 #6888666 未加载

banachtarskiover 11 years ago

Guys, use Redis for your real time data. Why else would you care about having the benefits of in-memory speed? Jesus. If a partition happened to my redis setup, you know what I'd do? Trash the whole thing and start again.

falcolasover 11 years ago

I'm not personally very familiar with Reddis or it's HA tenders, but (based solely on reading this article) they seem to suffer from problems (improper handling of non-quorum situations) that have been solved with tools like Pacemaker and Corosync.Has anyone attempted to use Pacemaker to wrangle Reddis instances?