I have a versioned backup system using a combination of Time Machine (locally) and Arq+B2 (remote) over two laptops. I've done spot checks to verify that my backups aren't corrupted but I can't figure out how users are supposed to verify their backups aren't corrupted. How do people do this? I'm asking this question in a personal context but I would be curious to know what an enterprise/business context looks like too.
I use rsync to generate and compare checksums of the original against the backup:<p><pre><code> rsync -cavin --info=name2 --no-perms --no-owner --no-group /local/data/path/ user@host:/remote/data/path/ | grep -e '<' -e '>'
</code></pre>
grep finds difference markers in the output.<p>-c for checksum generation.<p>-n for dry-run mode so rsync doesn't transmit files.
In my own personal use-case: my backups include a file that contains a list of every file that is backed up and each file's xxhash.[1] So verification is as simple as running "xxhsum --check" on this file. This is a bit slow, but it's much faster than something like SHA-256. And I don't know what scheme would be significantly faster. I just threw this together to solve my problem of verifying backups, without doing much research into the problem.<p>The backup is to physical media so that makes this scheme easy. Don't know if or how this could be applied to online backups.<p>[1]: <a href="https://cyan4973.github.io/xxHash/" rel="nofollow">https://cyan4973.github.io/xxHash/</a>