Less pdfblobby blog post <a href="https://www.daemonology.net/blog/2025-03-21-Chunking-attacks-on-Tarsnap.html" rel="nofollow">https://www.daemonology.net/blog/2025-03-21-Chunking-attacks...</a>
This is great, thank you! This was on my wishlist for a few years:<p><a href="https://www.reddit.com/r/crypto/comments/7imejm/monthly_cryptography_wishlist_thread_december_2017/dr00e10/" rel="nofollow">https://www.reddit.com/r/crypto/comments/7imejm/monthly_cryp...</a><p>I've tried to take a stab at this problem, but was not sure if it worked at all:<p><a href="https://gist.github.com/dchest/50d52015939a5772497815dcd33a7983" rel="nofollow">https://gist.github.com/dchest/50d52015939a5772497815dcd33a7...</a><p>It's a modified BuzHash with the following changes:<p>- Substitution table is pseudorandomly permuted (NB: like Borg).<p>- Initial 32-bit state is derived from key.<p>- Window size slightly varies depending on key (by ~1/4).<p>- Digest is scrambled with a 32-bit block cipher.<p>I also proposed adding (unspecified) padding before encrypting chunks to further complicate discovering their plaintext lengths. Glad to see I was on the right track :)
> I'm also exploring possibilities for making the chunking provably secure.<p>Seems like that’s possible[1] to do in a fairly straightforward manner, the question is if you can do this without computing a PRF for each byte.<p>[1] Obviously you’re always going to leak the total data size and the approximate size of new data per each transfer.
Could this be mitigated by randomising the block upload order?<p>A fresh backup will be uploading thousands of blocks. You don't want to create all the blocks before uploading, but a buffer of a hundred might be enough?
My reading is that the primary vector is based on the size of the chunks (due to deterministic chunking and length-preserving encryption). Would padding chunks with random-length data (prior to encryption) help mitigate this at the cost of additional storage (and complexity)?
Would SipHash be too slow? I think it would help mitigate the problem since you can key it to prevent known-plaintext attacks, right?<p>EDIT: or maybe this keyed rolling hash <a href="https://crypto.stackexchange.com/questions/16082/cryptographically-secure-keyed-rolling-hash-function" rel="nofollow">https://crypto.stackexchange.com/questions/16082/cryptograph...</a>
borg discussion + wiki page:<p><a href="https://github.com/borgbackup/borg/discussions/8694" rel="nofollow">https://github.com/borgbackup/borg/discussions/8694</a>