TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An Empirical Analysis of Hardware Failures on a Million Consumer PCs

150 pointsby mbafkalmost 13 years ago

13 comments

cs702almost 13 years ago
Very useful -- I will take this analysis into account when it's time to upgrade my current personal machine or configure the next one! Thank you for posting this here.<p>The only thing I would have wanted to see but didn't in this analysis is how failure rates vary for different types of disk subsystem -- specifically, traditional hard drives versus the newer solid-state devices. I suspect, but don't know for sure, that the latter have much, much lower real-world failure rates in the first 30 days of total accumulated CPU time (TACT).<p>The authors openly suggest that the sharp difference in failure rates between desktop and laptop machines may be due in part to their disk subsystems: "Laptops are between 25% and 60% less likely than desktop machines to crash from a hardware fault over the first 30 days of observed TACT. We hypothesize that the durability features built into laptops (such as motion-robust hard drives) make these machines more robust to failures in general." Alas, the authors don't delve any further into it.<p>I'd like to see hard data comparing the real-world failure rates of <i>both</i> desktops and laptops using traditional versus solid-state disk subsystems.
评论 #4161861 未加载
评论 #4162945 未加载
评论 #4162124 未加载
评论 #4161868 未加载
mrbalmost 13 years ago
When Microsoft, Google, or some university publish analysis of hardware failures across large numbers of machines, they always anonymize hardware vendors ("vendor A", "vendor B").<p>I understand the reasons (not alienating your hardware vendors), but will there ever be a research group who will disclose vendor names? Heck, I would <i>pay</i> for this information.
评论 #4164944 未加载
评论 #4166070 未加载
评论 #4164663 未加载
wazooxalmost 13 years ago
Among other interesting insights:<p>* a machine that crashed once is 100 times more likely to crash again; the more it crashes, the more it's prone to fail again.<p>* overclocking significantly reduces reliability. One CPU vendor (AMD or Intel, but unspecified) is much worse in this regard, too.<p>* conversely, underclocking improves reliability.<p>* branded computers are more reliable than beige boxes.<p>* laptops are more reliable than desktops.
评论 #4162075 未加载
ChrisNorstromalmost 13 years ago
I'm having a hard time coming to terms with "Laptops less likely to crash from hardware fault than desktops"<p>Everything we've learned from experience, surveys, and PC World magazines has showed the opposite. Heat kills hardware and laptops have their hardware packed together so closely that it generates lots of heat. Back then I remember reading something like 1 in 4 laptops fail in the first 3 years. Which was very believable, at the time I was in collage for game design &#38; development. All 80 guys in our class had laptops from HP (with get this... Pentium 4s in them). Those laptops had a LOT of problems. They were basically portable heaters.<p>So I guess laptops now have either much better cooling, much cooler CPUs or a combination. OR PCs are just terribly cooled.
评论 #4164590 未加载
评论 #4164124 未加载
评论 #4164123 未加载
josephturnipalmost 13 years ago
Interesting stuff. You can improve reliability by running your system at a lower speed. Here's a blog post with a summary of some of the conclusions of the paper above: <a href="http://grano.la/blog/2012/06/improve-the-reliability-of-your-pc/" rel="nofollow">http://grano.la/blog/2012/06/improve-the-reliability-of-your...</a> (Disclaimer: that's my company's blog)<p>One question I still have is whether the switching of CPU frequencies has any effect, or if it is only the average speed that correlates to the reliability. Anecdotal evidence suggests that this is the case, but it could be an area for further research.
kristapsalmost 13 years ago
Interesting, too bad the power supplies could not be controlled in their setup, as a wonky power supply can unleash all kinds of gremlins that look like failures in components down the line.
评论 #4164102 未加载
评论 #4161662 未加载
Zenstalmost 13 years ago
Interesting read though why can't Microsoft just tell me that my CPU or HD or memory is borking and suggest I RMA it instead of saying everytime - have you applied the latest updates, which I get to click unhelpful.<p>Most important thing in a PC I have found for reliability above everything else is a good PSU, realy does make a difference on the hardware side as you give your kit cleaner power. Add UPS/surge protector and you can double the lifetime of kit. Least from experience I've had it has been noticable.
Hoffalmost 13 years ago
The copy at Microsoft Research is offline.<p>Here's another copy of the paper:<p><a href="http://eurosys2011.cs.uni-salzburg.at/pdf/eurosys2011-nightingale.pdf" rel="nofollow">http://eurosys2011.cs.uni-salzburg.at/pdf/eurosys2011-nighti...</a>
评论 #4163113 未加载
acqqalmost 13 years ago
There are a lot of insights in the paper, but I'd really like to know about this:<p>"The table shows that CPUs from Vendor A are nearly 20x as likely to crash a machine during the 8 month observation period when they are overclocked, and CPUs from Vendor B are over 4x as likely"<p>Obviously it's 5 times difference in probability to have unstable system if overclocked between Intel and AMD but they don't say which one is better. Anybody knows?
评论 #4161704 未加载
评论 #4162019 未加载
hollerithalmost 13 years ago
The result most surprising to me is that laptops are between 25% and 60% less likely than desktop machines to crash from a hardware fault during the first 30 days worth of measurements.<p>The much larger weight and volume of desktops would seem to make them easier to cool.
评论 #4161989 未加载
hollerithalmost 13 years ago
Too bad CPU temperature was not part of the collection of data used in the study.
latchalmost 13 years ago
Is there a compelling reason for this to be a PDF rather than HTML? I'm genuinely curious.
评论 #4162179 未加载
评论 #4161539 未加载
评论 #4161540 未加载
评论 #4163750 未加载
stcredzeroalmost 13 years ago
I've always said that smart hardware tinkerers <i>underclock</i>. It produces less heat, and results in a quieter machine. I always suspected it improves reliability.
评论 #4165347 未加载