Testing disks: Lessons from our odyssey selecting replacement SSDs

131 pointsby yavor-atanasovover 7 years ago

12 comments

wtallisover 7 years ago

The biggest lesson to take away from this is probably that they thought they knew how to test a SSD, but were quite obviously clueless:> we run a fairly comprehensive set of block-level tests using fio, consisting of both sequential and random asynchronous reads and writes straight to the disk. Then we throw a few timed runs of the venerable dd program at it.Running dd as a benchmark is a major red flag. It show that they didn't know what they were doing with fio, and didn't trust its results. They later started using IOzone and a custom-written tool to accomplish stuff they should have done with fio in their initial testing.They also did not mention pre-conditioning the drives or ensuring that their tests run long enough to reach a steady state. This is one of the most important aspects of enterprise SSD testing and they would have known that if they'd consulted any outside resources on the subject instead of making up their own testing guidelines from a position of extreme ignorance about the fundamentals of the hardware they were using and the details of their own workload.They really should stop calling any of their tests "comprehensive".

评论 #15510058 未加载

评论 #15508285 未加载

nickcwover 7 years ago

This is the problem IMHO> We also looked up whether our HBA used TRIM in its current configuration. It turns out, in RAID mode, the HBA did not support TRIM. We did do some trim-enabled testing with a different machine, but these results are hard to compare fairly. In any case, we can't currently enable TRIM on our production systems.In our experience SSD write performance goes to sh*t if you don't regularly TRIM them.Running fstrim once a day is enough to keep them healthy.RAID cards not passing TRIM is a big problem for us too...(Experience from day job at Hosting Provider)

评论 #15508391 未加载

linsomniacover 7 years ago

This reminds me of testing I did years ago on ... CD-ROMs. Funny how lessons from old technology can apply to new technology.Around 15 years ago my company did a Linux distribution on CDs: KRUD. It was updated monthly, and we had something like 400 subscribers. For various reasons we burned these CDs in house on a cluster I built.We would burn, eject, read and checksum, and if the read test succeeded we would ship it out. We found some users with some discs had problems reading them. We contacted these users and paid them to return the CDs and did further testing on them.Our initial test was using dd, and we found that the discs that were not obviously damaged in shipping, would tend to pass tests on some of our CD-ROM drives, but fail on others. But when they did succeed, they would tend to take longer than normal.I wrote a new test program that instead of using dd directly used SCSI read commands, and timed every one. It would then count the number of reads that were "slow" (like 2x normal) and those that were "really slow" (like 5x), and if these got over a certain threshold we would throw away the disc.Being able to time the raw operations was incredibly useful, and seems like it could have shown the authors of this paper problems before being deployed to production.Except, they didn't really seem to do very thorough testing of the drives. Running stress testing on a 1TB drive for an hour seems pretty short.Also in my above job we did hosting. We found that if we burned in disks by reading/writing to them 10 times ("badblocks -svw -p 10"), we would almost never experience drive failures on the Hitachi drives we were using. If we didn't do this, the drives would have a fairly high chance of falling out of the RAID array in production.As drive sizes increased from 20GB to 200GB to 1TB, these tests started taking weeks to complete. But, they were totally worth it.

HarryHirschover 7 years ago

Flash memory has three operations, read, write and erase, the last two destructively. If you pretend they are harddisks with two operations of read and write you go through all sorts of contortions. Sometimes you fall flat on the face, as seen here.Why don't operating systems treat SSDs more flash memory, and why doesn't the file system cooperate with the underlying hardware instead of pretending it's a disk? For home use that may even work, but in a demanding environment the extra complexity will invariably fail.This is a genuine question, I'm an amateur here.

评论 #15509129 未加载

评论 #15508719 未加载

评论 #15508807 未加载

pxlfkrover 7 years ago

Plugging SATA drives into a SAS HBA may not be optimal: "SAS/SATA expanders combined with high loads of ZFS activity have proven conclusively to be highly toxic" <a href="http://garrett.damore.org/2010/08/why-sas-sata-is-not-such-great-idea.html" rel="nofollow">http://garrett.damore.org/2010/08/why-sas-sata-is-not-such-g...</a>

评论 #15507988 未加载

mjw1007over 7 years ago

One lesson here is that when reusing a previous test setup you ought to look for assumptions you made which are no longer valid.If they'd been starting from scratch, while thinking about modern SSDs, it's quite likely they wouldn't have built an application load tester using files containing only dots.But as it was an existing system, it didn't get the same amount of attention.

barrkelover 7 years ago

I built my home system early this year using the Samsung 960 Evo 1TB M2. Actual speeds were nowhere near advertised speeds until I enabled write-back cache on the drive, which gave me pause for concern about data persistence reliability. AFAIK the Samsung drivers (as opposed to the MS drivers I originally used) just turn this on without needing to be twiddled in settings.Just to confirm, I have seen the behaviour described herein, with write-back cached making enormous difference with the Samsung EVO product in particular.

评论 #15507930 未加载

评论 #15508052 未加载

评论 #15507872 未加载

pcfeover 7 years ago

You could give blkreplay a go next time you decide on which disks to buy. I find the additional effort is worth it, but ymmw. Use one of the shipped loads for a quick test, but you really want to run blktrace against your current setup and feed that data to blkreplay.

have_faithover 7 years ago

Great article, easy to follow considering it's far away from my normal domain.I noticed they didn't mention any brands by name though, why is that?

评论 #15507364 未加载

评论 #15507382 未加载

评论 #15507416 未加载

noir_lordover 7 years ago

God damn that was well written, excellent post!

fulafelover 7 years ago

Sounds like they are observing transparent data compression in the SSD controller and FTL. SandForce controllers even made a marketing point if it back in the day. It manifests as faster IO with repetitive data, along with reduced flash wear.

pricechildover 7 years ago

One of my favourite parts of this article is how Elliot Thomas describes himself as a "Software Engineer".We may be writing software, but without a working knowledge of hardware it's not worth much!

评论 #15507446 未加载