I don't know if this is similar to: <a href="https://www.researchgate.net/publication/10744818_Chain_Letters_and_Evolutionary_Histories" rel="nofollow noreferrer">https://www.researchgate.net/publication/10744818_Chain_Lett...</a><p>which if I recall correctly, drew an evolutionary tree of old school postal chain letters using normalized compression distance.<p>I've also wondered if gzip is the best way of doing the compression. It compresses by block, and the decompression table(s) are in the file. It's not terribly hard to write a Huffman encoder where the character frequency table resides in a separate file. Huffman encoding isn't done by block, so it's inefficient compared to gzip as a compression method, but normalized compression difference isn't really reliant on best possible compression.