Some questionable choices are made in this optimization.<p>The reason for the optimization is that there is so much IO activity the RAID checks can't complete.<p>It is unclear from the article if the RAID checks were ever completed on 17TiB of data. Instead, they choose to disable the periodic RAID checks and instead switch to doing the error checking as a page of data is read in. The two are not equivalent, and both should be used for important data.<p>Finding corrupt data only as you try to read it can lead to long running data corruptions, maybe to the point your backups do not go back far enough to restore the uncorrupted data. Underpinning this also is a change to RAID 0... While the fastest option, they are putting a lot of faith in that NVMe config handling that kind of workload.<p>Hope they have good backups...<p>EDIT: A good way to solve this is to spin up a temporary server, restore your backups to it, do the full data checks and when successful, you have also checked your backup and restore process along with the integrity of the file.
You still want to have enough overhead available to complete the RAID checks on the primary server and don't use RAID 0 for performance.