Those who forget history are doomed to repeat it. Just seven years ago Crucial sold tens of thousands of their "M4" SSDs with a firmware bug that made them fail after 5184 hours: <a href="https://www.anandtech.com/show/5424/crucial-provides-a-firmware-update-for-m4-to-fix-the-bsod-issue" rel="nofollow">https://www.anandtech.com/show/5424/crucial-provides-a-firmw...</a><p>Do they still not test these things with artificially incremented counters?
According to this page, the SMART hour counter is only 16 bits, and rollover should be harmless:<p><a href="http://www.stbsuite.com/support/virtual-training-center/power-on-hours-rollover" rel="nofollow">http://www.stbsuite.com/support/virtual-training-center/powe...</a><p>If you look elsewhere on the Internet, you'll find people with very old and working HDDs that have rolled over, so I suspect this bug is limited to a small number of drives.<p>(What that page says about not being able to reset it is... not true.)<p>Likewise, I'm skeptical of "neither the SSD nor the data can be recovered" --- they just want you to buy a new one.<p>Tangentially related, I wonder how many modern cars will stop working once the odometer rolls over.
> HPE was notified by a Solid State Drive (SSD) manufacturer [...]<p>That's a curious bit of context. It seems to imply they're shifting some of the blame onto their manufacturer? I makes me wonder if this firmware is 100% HPE specific, or if there a 2^16 hours bug about to bite a bunch of other pipelines.
>The issue affects SSDs with an HPE firmware version prior to HPD8 that results in SSD failure at 32,768 hours of operation (i.e., 3 years, 270 days 8 hours). After the SSD failure occurs, neither the SSD nor the data can be recovered. In addition, SSDs which were put into service at the same time will likely fail nearly simultaneously.<p>Looks like some sort of run time stored in a signed 2 byte integer. Oops.
Would be nice if the standard firmware update mechanism on Linux (fwupd/LVFS) could be used for HPE products.<p><a href="https://fwupd.org/lvfs/vendors/" rel="nofollow">https://fwupd.org/lvfs/vendors/</a>
<a href="https://fwupd.org/lvfs/devices/" rel="nofollow">https://fwupd.org/lvfs/devices/</a>
Whatever the counter is, the fact that it's 32,768 instead of 65,536 suggests they used a <i>signed int</i> for something that presumably starts at zero and increases monotonically... Avoiding just that mistake would've given them twice as much time - nearly 7.5 years - which seems like it'd be longer than these drives would typically last anyway.
>By disregarding this notification and not performing the recommended resolution, the customer accepts the risk of incurring future related errors.<p>How is this work legally? For one, how would HPE prove that the customer read the bulletin? I don't imagine they're sending these out via certified mail.
Probably related to <a href="https://news.ycombinator.com/item?id=21471997" rel="nofollow">https://news.ycombinator.com/item?id=21471997</a>