Andy at Backblaze here. To put a pin in it, the three main factors for which drives we use are cost, availability and reliability. We have control over reliability as our systems are designed to deal with drive failure. That leaves the market to decide on cost and availability. Assuming a competitive market we can buy the drives that optimize those factors.
I win. The 8TB drives I used to populate my NAS scored 0.10% better than the 0.81% average. That means my NAS is better than everyone else's.<p>This big takeaway from these numbers is how dramatically low they are in comparison to drives 5/10/25 years ago. If you treat them reasonably, modern drives are rock stable.
I really enjoy reading the Backblaze analysis on this every year, it's such a valuable and interesting data set. I do have one suggestion: it would be great to go one step further and add confidence intervals for the AFR estimates. E.g. if you see 0 drive failures, you don't really expect an AFR of 0 (that is not the maximum likelihood estimate), and the range of AFR's you expect for each drive decreases as a function of the number of drive days (e.g. if one drive has 1 day of use, we know basically nothing about AFR so confidence interval would be ~0-100% (not really, but still quite large), or it would be smaller if you wanted to add a prior on AFR).<p>It would also be interesting to see the time-dependence (i.e. does AFR really look U-shaped over the lifetime of a drive?). That would require a dataset with every drive used, along with (1) number of active drive days, and (2) a flag to indicate if the drive has failed and of course (3) which kind of drive it is. Does Backblaze offer this level of granularity?<p>EDIT: They offer the raw data dumps!<p><a href="https://www.backblaze.com/b2/hard-drive-test-data.html#downloading-the-raw-hard-drive-test-data" rel="nofollow">https://www.backblaze.com/b2/hard-drive-test-data.html#downl...</a><p>Backblaze, god bless you.
I've recently joined the WD shucking crowd so, unfortunately, there are no stats for the drives I'm running. The cost/GB has just gotten too ridiculously low on shucked drives and my array large enough that a failure or two isn't the end of the world, oh and 3-2-1 backup rule.<p>I've always really enjoyed reading Backblaze's reports though and have made past buying decisions based on their information.<p>I wonder if it might be possible for a community sourced version of these reports? A small app that checks SMART data and sends it to a central repository for displaying stats? Many from the self-hosted/homelab crowd are running these shucked drives so there has to be a large pool of stats out there if it can be gathered?
One of my main questions is how different these stats are for hard drives that are not running all the time. Personal experience over a decade in an underfunded lab was that if you took a drive offline and left it for couple years, you had 10-40% chance of it failing.
Interesting how HGST has consistently low failure rates [0]. What makes these drives so reliable? Japanese fixation on quality? Specifics of Hitachi design?<p>[0] <a href="https://www.backblaze.com/blog/wp-content/uploads/2020/08/Chart-Q2-2020-MFR-AFR-1024x679.jpg" rel="nofollow">https://www.backblaze.com/blog/wp-content/uploads/2020/08/Ch...</a>
Why wouldn't backblaze use only HGST then? What's the purpose of buying seagates which was the most unreliable hard drives from 15-20 years ago and their stats show higher failure rates. (Still I don't buy them because of the bad taste left.)
It's interesting that, despite Seagate being relatively bad compared to HGST, Backblaze keeps installing more and more Seagate drives and not that many HGST drives. I guess the analysis that's missing here is the cost per drive hour?
Are there trends for AFR by drive age? ie. for a specific drive or manufacturer, what is the failure rate for drives that have been in use <i>n</i> years? It'd be interesting to see how the failure rate go up/down as they get older.
Why did they stop buying WDC drives? Is there a known issue with them?<p>Also, why can't I find HGST drives for decent prices? On Amazon they are either refurbished drives or crazy expensive prices for new ones?
The volumes of units are high enough I think this is north of 'too small sample size for statistical validity' So the question in my mind is, how <i>significant</i> is the variance in the failure rates?<p>Is this underlying manufacturing error tolerances, or is this shipping/deployment effects, or is this .. Aliens?