sha1 hashes are not secure, and collisions can be practically produced by attackers.<p>I'm wondering if these attack strategies could one day be used to find collisions for the pieces in a bittorrent archive to effectively "recover" the data that was originally used to create the bittorrent archive.<p>Let's say the piece size is 4 KiB, that means there are 2^(8x4x1024) possible combinations of bytes for each piece. That is a number with almost 10 thousand digits. Could these sha1 collision attack strategies be used to cut down on the number of combinations we'd have to comb through to find the matching 4 KiB of binary that original produced the hash?<p>Either way, I'm interested to learn more about how this works. Thanks.
The problem is that a hash is not even close to a one-to-one mapping of input data <-> output hash. There are a truly enormous number of possible 4 KiB strings that can produce a given hash. Assuming SHA-1 is evenly distributed, there are 2^(8*4*1024) / 2^160 possibilities (which is 2^32608). You could find <i>some</i> 4 KiB piece that works, but it would almost certainly not be the original. Even if your torrent had a single file with a single missing 4 KiB piece, and even if you had a machine that could spit out every configuration that would pass the hash check, you'd still have 2^32608 theoretically valid files to test.<p>(Note: I'm not taking the time to think the specifics through so my math may be incorrect, but the conclusion that the search space would remain huge is sound.)