TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Building a data compression utility in Haskell using Huffman codes

232 点作者 lazamar11 个月前

15 条评论

mrkeen11 个月前
There exists an array-based, in-place algorithm for this, reducing the need to allocate trees and chase pointers.<p>I mention this only because, when I learned the tree-based approach at uni, I simply wasn&#x27;t aware that there was another way to do it, and I&#x27;m wondering how many of you that&#x27;s true for as well.<p>While the tree approach is intuitive and illuminating, it probably makes more sense to work with in-place arrays, since the situations when you care most about compression are probably the situations when you have a lot of data and want to run fast.<p><pre><code> In-Place Calculation of Minimum-Redundancy Codes Moffat, Katajainen. 1995. http:&#x2F;&#x2F;hjemmesider.diku.dk&#x2F;~jyrki&#x2F;Paper&#x2F;WADS95.pdf</code></pre>
评论 #40873162 未加载
评论 #40875056 未加载
评论 #40875000 未加载
评论 #40873228 未加载
tromp11 个月前
&gt; To make it unambiguous we must make sure that no code word is a prefix of another code word.<p>Technically, this is not quite correct. The class of so-called uniquely decodable codes is unambigous, and a superset of the prefix codes. One simple example of a uniquely decodable code is the reverse of a prefix code. For the example in the article that would be<p><pre><code> a 1 b 00 c 10 </code></pre> While the code for a is a prefix of the code of c, one can still unambiguously decode any code sequence by processing it in reverse order. It would be interesting to see a uniquely decodable code that is neither a prefix code nor one in reverse.
评论 #40878776 未加载
评论 #40875389 未加载
评论 #40875179 未加载
评论 #40873413 未加载
评论 #40873670 未加载
评论 #40875172 未加载
atlintots11 个月前
This is great! Are there any other similar tutorials going through writing a Haskell program, but with some more advanced features (monad transformers, lenses, etc)
评论 #40873749 未加载
评论 #40875618 未加载
goldfishgold11 个月前
Coursera’s functional programming course (in Scala) includes a pretty similar Huffman coding assignment with autograder if anybody wants to take a stab at it themselves.<p><a href="https:&#x2F;&#x2F;www.coursera.org&#x2F;learn&#x2F;scala-functional-programming?specialization=scala#modules" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;learn&#x2F;scala-functional-programming?...</a>
banish-m411 个月前
Last time I used Huffman codes, it was to run a MICMAC processor macroprogram (assembly text) in the fewest number of microcycles and to use the fewest microinstructions in the microprogram (microcode). So starting with a histogram of the macroinstructions executed (IIRC, I first wrote an interpreter in C to count how many of each were executed), I crafted a progressive decoding microcode program to implement all of the required ISA macro-operations. IIRC, the macro instruction ISA I created was bit-granular instead of byte-oriented. In the real world, it would&#x27;ve been slow and inconvenient. What&#x27;s nice about Huffman codes is that you can vary the prefix depth based on the distribution of values, so you don&#x27;t have to have lopsided codes based on 1 bit prefixes.<p>Also, the microprogram had to deal with branch prediction because it was a non-superscalar pipelined processor model. Guess the wrong branch, and enjoy wasting cycles on a pipeline stall while the correct branch filters forward.
tankfeeder11 个月前
<a href="https:&#x2F;&#x2F;rosettacode.org&#x2F;wiki&#x2F;Huffman_coding" rel="nofollow">https:&#x2F;&#x2F;rosettacode.org&#x2F;wiki&#x2F;Huffman_coding</a>
ykonstant11 个月前
Hey, since this is likely to attract Haskell programmers: how fast is Haskell these days for a programmer intent on writing optimized code? I am particularly interested in its performance for numerical crunching like matrix operations and other stuff that benefit from SIMD.
评论 #40873943 未加载
评论 #40873780 未加载
评论 #40873689 未加载
评论 #40875608 未加载
评论 #40877187 未加载
评论 #40875959 未加载
评论 #40876921 未加载
评论 #40873651 未加载
评论 #40877542 未加载
alwinaugustin11 个月前
Thanks for sharing. Very nice and insightful.
chvrchbvrner11 个月前
I think there is a typo in the table of the &quot;Creating prefix-free codes&quot; section. D should be &#x27;0010&#x27; (not &#x27;0110&#x27;).<p>Otherwise a great read, thanks!
评论 #40873812 未加载
评论 #40873856 未加载
lo_zamoyski11 个月前
Btw, what is that on the girl&#x27;s shirt in the image?<p>Direct link: <a href="https:&#x2F;&#x2F;lazamar.github.io&#x2F;images&#x2F;data-compressor.svg" rel="nofollow">https:&#x2F;&#x2F;lazamar.github.io&#x2F;images&#x2F;data-compressor.svg</a>
评论 #40886952 未加载
londons_explore11 个月前
For all readers, arithmetic codes are better in nearly all ways. They can be implemented in less RAM and code, they compress and decompress to a better ratio, and the probabilities of different symbols appearing can be dynamically updated during the stream far more easily.<p>The only reason Huffman codes are used is they were invented first and arithmetic codes were patented. That patent has now expired, so we should use the better design.
评论 #40873172 未加载
评论 #40873025 未加载
评论 #40873124 未加载
评论 #40873342 未加载
评论 #40875083 未加载
lynx2311 个月前
Very nice read, thanks for sharing!
评论 #40873734 未加载
bdahz11 个月前
How is the performance when compared to similar implementations in C&#x2F;C++ or Rust?
评论 #40873762 未加载
评论 #40873944 未加载
benreesman11 个月前
Haskell is a really nice language. In general I don’t identify as an X programmer for any value of X: I tend to write in a half dozen languages daily and they all suck in their own special way.<p>But on two separate occasions I made important career decisions with opportunity cost to work with highly lethal GHC contributors: those people are just really good.<p>If Haskell sucks like all languages it’s because Haskell excels at using computers to <i>compute</i> something: Haskell considers data shuffling a strictly secondary concern compared to doing actual computations.
评论 #40873940 未加载
评论 #40874898 未加载
评论 #40874860 未加载
2-3-7-43-180711 个月前
and yet another episode in the series of &quot;look, what I did with Haskell&quot;