My current project is a Huffman coder built in Zig, it's been a lot of fun to build. The Huffman tree generation is simple, but there's a lot of different nuances and variations to account for.<p>For example, a "canonical Huffman code" can save you space encoding the table through some clever trickery. In short, you can encode the table by counting the number of bytes that use each bit-count, and the bytes used in the file. You don't need to store the patterns at all, since in the canonical algorithm the patterns are regenerated from the bit-lengths. [1]<p>Right now I'm trying to implement the package-merge algorithm,[2] which will allow me to create the table without building an inefficient fully-allocated tree, and more importantly will allow me to limit the lengths of my codes (the maximum code length is n-1, where `n` is the length of your alphabet. Working with bytes and using 255 bit codes is obnoxious). Unfortunately all explanations of the algorithm I've found are very academic and mathematical, so I'm having trouble working it out.<p>Some of you might be interested in the short video Tom Scott made about Huffman coding.[3]<p>1. <a href="https://en.wikipedia.org/wiki/Canonical_Huffman_code#Encoding_the_codebook" rel="nofollow">https://en.wikipedia.org/wiki/Canonical_Huffman_code#Encodin...</a><p>2. <a href="https://en.wikipedia.org/wiki/Package-merge_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Package-merge_algorithm</a><p>3. <a href="https://www.youtube.com/watch?v=JsTptu56GM8" rel="nofollow">https://www.youtube.com/watch?v=JsTptu56GM8</a>