TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Copysets and Chainsets: A Better Way to Replicate (2014)

34 pointsby nodivbyzeroalmost 8 years ago

1 comment

GauntletWizardalmost 8 years ago
Precisely where they went wrong:<p><pre><code> In practice, the speed of recovery is typically bottlenecked by the incoming bandwidth of the recovering server, which is easily exceeded by the outgoing read bandwidth of the other servers, so this limitation is typically not a big deal in practice. </code></pre> If you&#x27;re recovering to <i>one</i> server, you&#x27;re going to have a bad time. With random distribution, you recover to <i>every</i> server, equally, over a very short period of time. The tradeoff is that you&#x27;ll have a lot of churn, as temporary failures cause a lot of data to be rereplicated, and then extra copies deleted as the come back online. On the other hand, this helps balance your utilization and load.<p>The actual insight is that you want failure domain anti-affinity; That is, if you have 1000 servers on 50 network switches, you want your replica selection algorithm to use not three different machines at random, but three different <i>switches</i> at random. If you have three different AZ&#x27;s, for each copy, put one replica in each of the three. Copysets can provide this, but as stated in the article, they&#x27;re much more likely to give you Achilles heels - A typical failure won&#x27;t hurt, won&#x27;t have any unavailability - But the wrong one, and you go down hard, with N% dataloss rather than thousandths of a percent dataloss.<p>In short - Failures happen. Recovering from them is what matters, not convincing yourself that they can&#x27;t happen.
评论 #14570097 未加载