Ask HN: Will we ever be able to recover BitTorrent files from the piece hashes?

8 pointsby jonesncabout 3 years ago

sha1 hashes are not secure, and collisions can be practically produced by attackers.I'm wondering if these attack strategies could one day be used to find collisions for the pieces in a bittorrent archive to effectively "recover" the data that was originally used to create the bittorrent archive.Let's say the piece size is 4 KiB, that means there are 2^(8x4x1024) possible combinations of bytes for each piece. That is a number with almost 10 thousand digits. Could these sha1 collision attack strategies be used to cut down on the number of combinations we'd have to comb through to find the matching 4 KiB of binary that original produced the hash?Either way, I'm interested to learn more about how this works. Thanks.

5 comments

jml7c5about 3 years ago

The problem is that a hash is not even close to a one-to-one mapping of input data <-> output hash. There are a truly enormous number of possible 4 KiB strings that can produce a given hash. Assuming SHA-1 is evenly distributed, there are 2^(8*4*1024) / 2^160 possibilities (which is 2^32608). You could find some 4 KiB piece that works, but it would almost certainly not be the original. Even if your torrent had a single file with a single missing 4 KiB piece, and even if you had a machine that could spit out every configuration that would pass the hash check, you'd still have 2^32608 theoretically valid files to test.(Note: I'm not taking the time to think the specifics through so my math may be incorrect, but the conclusion that the search space would remain huge is sound.)

TechBro8615about 3 years ago

If you put enough monkeys on type writers, they'll create Ubuntu 18.04 LTS.

wmfabout 3 years ago

What you're talking about is a preimage not a collision which is a different kind of attack.

评论 #30784752 未加载

simplicialsetabout 3 years ago

It's probably easier with quantum computers: <a href="https://eprint.iacr.org/2020/213.pdf" rel="nofollow">https://eprint.iacr.org/2020/213.pdf</a>.

评论 #30788361 未加载

rdtwoabout 3 years ago

As in hashing being there ultimate compression… seems unlikely