There's quite a lot of retro modern LZ activity too! LZ turns out to be amazing on old machines, often only a couple times slower than a block copy. Optimal compressors and control over the algorithm have led to some very tight demos.<p><a href="https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index.html" rel="nofollow">https://www.brutaldeluxe.fr/products/crossdevtools/lz4/index...</a> LZ4 Data Compression - a rather long and in-depth article looking at LZ4 on the 65816 for the Apple IIgs with a decompressor that exploits that processor's block copy.<p><a href="https://github.com/emmanuel-marty/lzsa" rel="nofollow">https://github.com/emmanuel-marty/lzsa</a> - LZSA - a LZ4-like modern LZ that's more efficient both in speed and compression to LZ4 (at least on the 8 bitters it targets) - includes a neat chart of speed/compression trade-offs on a ZX Spectrum with a variety of algorithms
Discussed at the time:<p><i>Modern LZ Compression</i> - <a href="https://news.ycombinator.com/item?id=19064791" rel="nofollow">https://news.ycombinator.com/item?id=19064791</a> - Feb 2019 (4 comments)
"Managing Gigabytes" by Witten et al is _the_ book that got me into the field of compression back in the day. Granted, it’s a bit dated now as there’s been a lot of progress in the field, but I’d still heartily recommend it to newcomers.
Only partly related, since it deals with a specific type of data (English language Wikipedia), some of you might enjoy reading about the Hutter Prize: <a href="https://en.wikipedia.org/wiki/Hutter_Prize" rel="nofollow">https://en.wikipedia.org/wiki/Hutter_Prize</a>
Does anyone have recommendations for learning the xz format? Particularly as used by the ZIM archival format: <a href="https://en.wikipedia.org/wiki/ZIM_(file_format)" rel="nofollow">https://en.wikipedia.org/wiki/ZIM_(file_format)</a>
> First, we just shorten any symbols that are longer than our maximum code length — 11 — to that value. This means that our tree will no longer be a Huffman tree, so we need to do a fixup pass to redistribute the error we've introduced.<p>This can in fact be solved directly and optimally: <a href="https://en.wikipedia.org/wiki/Package-merge_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Package-merge_algorithm</a> ; <a href="https://en.wikipedia.org/wiki/Huffman_coding#Length-limited_Huffman_coding" rel="nofollow">https://en.wikipedia.org/wiki/Huffman_coding#Length-limited_...</a>
I just made a huffman encoder on the c64, for fun. I need understand how you go from the variable length codes of huffman, to suddenly fixed lenght codes, because you don't want the codes to be above a certain length. hmm...
I wonder if LZ would still be standard, if not for the inertia of gzip/zip? There are surely better and comparably fast algorithms (paq, ppm, etc.)