TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Usenix ATC best student paper award on distributed storage

24 pointsby sqrtnlognalmost 12 years ago

2 comments

tompalmost 12 years ago
TL;DR:<p>Datacenter operators incur a significant cost if after a cluster-wide power outage some nodes fail permanently; finding chunks of data that are lost (i.e. all replicas failed) is a big fixed cost, so it is in their interest to reduce the probability of data loss at the expense of increasing the magnitude of data loss (i.e. you lose data less often, but when you do, you lose more data).<p>&gt; The probability of data loss is minimized when each node is a member of exactly one copyset. For example, assume our system has 9 nodes with R[eplication]= 3 that are split into three copy sets: {1, 2, 3}, {4, 5, 6}, [and] {7, 8, 9}. Our system would only lose data if nodes 1, 2 and 3, nodes 4, 5 and 6 or nodes 7, 8 and 9 fail simultaneously.<p>&gt; In contrast, with random replication and a sufficient number of chunks, any combination of 3 nodes would be a copyset, and any combination of 3 nodes that fail simultaneously would cause data loss.<p>In the case above, in case of single node failure there only 2 other nodes from which new replacement node can bootstrap. They then relax the constraint that one node belongs to only one Copyset, which slightly increases the probability of data loss, but speeds up recovery from partial failure.
eigenrickalmost 12 years ago
It seems that this is just a structured way to formally keep more copies of your data, when what you&#x27;re trying to avoid is a rack level event removing availability of your replicas.<p>Ceph, described here <a href="http://ceph.com/papers/weil-thesis.pdf" rel="nofollow">http:&#x2F;&#x2F;ceph.com&#x2F;papers&#x2F;weil-thesis.pdf</a> does just that by letting you include the structure of your datacenter in the pseudorandom, deterministic placement algorithm it uses for placing reads and writes.