TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Correlated Failures in Storage Systems

41 pointsby logicalstackabout 4 years ago

2 comments

rsyncabout 4 years ago
We thought a lot about correlated storage failures - especially with regard to SSDs - as we rebuilt our infrastructure circa 2012&#x2F;2013.<p>In the end, the low hanging fruit - or, the biggest actionable takeaway - was that <i>when we build boot mirrors out of SSDs, they should not be identical SSDs</i>.<p>This was a hunch I had, personally, and I think experience and, now, results like these, bear it out.<p>Consider: an SSD can fail <i>in a logical way</i>. Not because of physical stress or mechanical wear, which has all kinds of random noise in the results - but due to a particular sequence of usage. If the two SSDs are mirrored, it is possible that they receive <i>identical</i> usage sequences over their lifetime.<p>... which means they can fail identically - perhaps simultaneously.<p>Nothing fancy or interesting about the solution: all rsync.net storage arrays have boot mirrors that mix either the current generation Intel SSD with the previous generation Intel SSD <i>or</i> mix an Intel SSD with a Samsung SSD.
评论 #26483758 未加载
评论 #26480932 未加载
评论 #26481573 未加载
Psychlistabout 4 years ago
Also: different NAS hosts, RAID cards, etc. Those have correlated failure modes too.<p>My personal backup strategy of buying a different backup drive every time seems wiser the more I learn.<p>At work we have two different NAS setups, each full of a different brand of near-identical drives. But what we have been doing is buying a few new drives every quarter and rotating them in to the NAS boxes. So they&#x27;re all WD 6TB Black or whatever, but of 12 drives we now have 4 original ones, then a pair 3 months newer than that, a pair 6 months newer and so on. The &quot;old&quot; drives go into random stuff round the office because we employ engineers and they seem to all like to have their own little 2-4 drive NAS boxes for &quot;important stuff&quot; (which is in many ways fine, we just have to regularly coach them on making sure the stuff they&#x27;re actually working on is on our NAS where it gets backed up. We host a gitlab instance, for example, so their code and project docs are in that).