What Is Huffman Coding?

282 点作者 davethedevguy大约 4 年前

17 条评论

terrelln大约 4 年前

I recently reworked how zstd builds its Huffman decoding tables (not decoding itself) to avoid unpredictable branches and speed the table building up by about ~2x [0]. This is insignificant for large decompressions, but if you're decompressing only a few KB, the table building time can dominate the actual decompression.It sort of goes to show that while Huffman codes have been around for ages, implementations can still improve, especially as the hardware we use changes.[0] <a href="https://github.com/facebook/zstd/pull/2271" rel="nofollow">https://github.com/facebook/zstd/pull/2271</a>

评论 #26225319 未加载

jmspring大约 4 年前

Had classes from David Huffman while at UC Santa Cruz. One of my favorite professors and he did not like when people constantly brought up Huffman coding.Spent several hours talking various topics, but one of his favorite areas of exploration when I was around campus was paper folding.A couple of links with examples:- <a href="https://www.cise.ufl.edu/~manuel/huffman/index.html" rel="nofollow">https://www.cise.ufl.edu/~manuel/huffman/index.html</a>- <a href="https://collections.mitmuseum.org/collection/david-a-huffman-collection/" rel="nofollow">https://collections.mitmuseum.org/collection/david-a-huffman...</a>

评论 #26220898 未加载

评论 #26222600 未加载

Rendello大约 4 年前

My current project is a Huffman coder built in Zig, it's been a lot of fun to build. The Huffman tree generation is simple, but there's a lot of different nuances and variations to account for.For example, a "canonical Huffman code" can save you space encoding the table through some clever trickery. In short, you can encode the table by counting the number of bytes that use each bit-count, and the bytes used in the file. You don't need to store the patterns at all, since in the canonical algorithm the patterns are regenerated from the bit-lengths. [1]Right now I'm trying to implement the package-merge algorithm,[2] which will allow me to create the table without building an inefficient fully-allocated tree, and more importantly will allow me to limit the lengths of my codes (the maximum code length is n-1, where `n` is the length of your alphabet. Working with bytes and using 255 bit codes is obnoxious). Unfortunately all explanations of the algorithm I've found are very academic and mathematical, so I'm having trouble working it out.Some of you might be interested in the short video Tom Scott made about Huffman coding.[3]1. <a href="https://en.wikipedia.org/wiki/Canonical_Huffman_code#Encoding_the_codebook" rel="nofollow">https://en.wikipedia.org/wiki/Canonical_Huffman_code#Encodin...</a>2. <a href="https://en.wikipedia.org/wiki/Package-merge_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Package-merge_algorithm</a>3. <a href="https://www.youtube.com/watch?v=JsTptu56GM8" rel="nofollow">https://www.youtube.com/watch?v=JsTptu56GM8</a>

评论 #26222394 未加载

algorithm314大约 4 年前

One of the fastest Huffman compression libraries is FPC <a href="https://github.com/algorithm314/FPC/" rel="nofollow">https://github.com/algorithm314/FPC/</a> It even has compression ratio better than some AC implementations. Better than ZSTD's

Jasper_大约 4 年前

Gah! Another explanation that uses "Huffman Trees"! Nobody uses that, we all use Canonical Huffman, where all you have are the number of symbols per code length, letting you efficiently build tables. Yes, tables are used instead of trees. The trees are a distraction.<a href="https://en.wikipedia.org/wiki/Canonical_Huffman_code" rel="nofollow">https://en.wikipedia.org/wiki/Canonical_Huffman_code</a>

评论 #26221517 未加载

评论 #26220636 未加载

评论 #26220490 未加载

评论 #26220548 未加载

cdrini大约 4 年前

Wonderful summary and very well explained! Those graphics could be in a CS textbook :) A bit more (or perhaps another article) on how the tree can be encoded would be useful, since the total size of your compressed example would need to include the size of the tree. Also: JPEGs also have a Huffman tables region in their binary; haven't dug too deeply, but seems like they also use Huffman coding!

swframe2大约 4 年前

As long as we're on this topic, might as well pivot to information theory. <a href="https://www.youtube.com/playlist?list=PLruBu5BI5n4aFpG32iMbdWoRVAA-Vcso6" rel="nofollow">https://www.youtube.com/playlist?list=PLruBu5BI5n4aFpG32iMbd...</a>

评论 #26223944 未加载

gorgoiler大约 4 年前

Huffman coding is the bomb with the kids. Any kind of encoding / cipher stuff is well received but the application of a binary tree makes it HC more cool. I love teaching this part of the syllabus so much.

numlock86大约 4 年前

> This is 29 bits instead of 96, with no data loss. Great!What's the reasoning of leaving out the tree structure needed for decoding in this argument?

评论 #26275713 未加载

tcgv大约 4 年前

Shameless plug: A couple of years ago I wrote an implementation of the huffman coding algorithm as well while studying it, along with unit tests. I was also interested in practicing OOP. You can find the result in the link below.- <a href="https://github.com/TCGV/HuffmanCoding" rel="nofollow">https://github.com/TCGV/HuffmanCoding</a>

评论 #26224992 未加载

utopcell大约 4 年前

Silly example. It compresses the phrase "do or do not" that has 6 symbols (d, o, r, n, t, space), builds a huffman tree just for these and then assumes significant compression by using 8 bits per character for the uncompressed case.

评论 #26225672 未加载

评论 #26225727 未加载

评论 #26225565 未加载

nooyurrsdey大约 4 年前

Learning about Huffman Encoding in school was my first exposure to these sort of algorithms. I was an electrical engineering major and had little exposure to computer science at the time.It captivated my interest immediately - it was such a simple and effective approach, and it demystified how compression algorithms worked.I really like this overview. It's not meant to be a full discourse, more just an intro for newcomers. And I think it gets the basic idea across very effectively.

benibela大约 4 年前

I implemented Huffman Coding when I was 14 years old: <a href="http://benibela.de/sources_en.html#huffman.zip" rel="nofollow">http://benibela.de/sources_en.html#huffman.zip</a>Probably the first non-trivial data structure/algorithm I had implemented (previously I was making games, where you need no algorithm more complex than rectangle intersection)

ggghhhfff大约 4 年前

I am curious as to how this works for filetypes other than text files- what are the contents of each node in the tree for, say, a PNG file?

评论 #26220045 未加载

评论 #26222817 未加载

评论 #26220087 未加载

评论 #26220256 未加载

评论 #26220521 未加载

评论 #26220037 未加载

评论 #26222519 未加载

doc_gunthrop大约 4 年前

To grok a concept it helps to actually do it. You can take on a coding challenge for Huffman Encoding at Codewars:<a href="https://www.codewars.com/kata/54cf7f926b85dcc4e2000d9d" rel="nofollow">https://www.codewars.com/kata/54cf7f926b85dcc4e2000d9d</a>

wyclif大约 4 年前

Am I the only one who clicked on this thinking it was about Steve 'Spez' Huffman's current side project?

soheil大约 4 年前

It's a compression technique used in stuff like PNG and gzip, saved you a click.

评论 #26220433 未加载

评论 #26220516 未加载