Sadly, the description of arithmetic coding bears almost no resemblance to the actual algorithm (it roughly describes the equal probability case, but that misses most of the point). The description of Shannon-Fano as "bottom up" and Huffman as "top-down" is also exactly backwards (the actual descriptions of those algorithms are accurate, but the labeling is confused).<p>The article contains a lot of terms you can search for if you are interested in these things, but sadly is not very informative on its own.
An interesting read, I will need to go through it with more care when I get home. A couple of personal interests are having studied with both David Huffman (though not for compression) and Glen Langdon (one of the arithmetic coding pioneers), I need to see how this article compares with my notes.<p>I also need to see about getting some of my old (15+ years) course notes online.<p>Just in those two, much history and memory lost and a hit for UC Santa Cruz's computer engineering and science departments (Huffman passed a number of years back, Glen this year after a period of retirement).<p>Edit - the scope of the article and the title provided...a serious disconnect in terms of breadth. But, better than nothing.
If anyone is interested in the PAQ code, I have put one of the stronger-but-open-source variants on GitHub, here: <a href="https://github.com/robertseaton/paq8pxd" rel="nofollow">https://github.com/robertseaton/paq8pxd</a>
This article is very good, but it is very short about gzip. Event if there was no technical innovation, I think that gzip was really important for development of compression softwares that are not patent-encumbered.
Seems like even though people went after this, there's not been much innovation, last few decades. Most of the new stuff looks quite incremental. It almost seems like we are losing our edge in our ability to build from the ground up. This is because we are part of a system that maintains an eagle's eye on new ideas. We need to take some of that pressure off us so that we can think a bit outside the box.
Site seems to be down. Cached version: <a href="http://webcache.googleusercontent.com/search?q=cache%3Aieeeghn.org%2Fwiki%2Findex.php%2FHistory_of_Lossless_Data_Compression_Algorithms&oq=cache%3Aieeeghn.org%2Fwiki%2Findex.php%2FHistory_of_Lossless_Data_Compression_Algorithms" rel="nofollow">http://webcache.googleusercontent.com/search?q=cache%3Aieeeg...</a>
One interesting domain-specific class of compression algorithms not mentioned here is for lossless audio compression, which tends to use a different (though also pretty simple) technique, somewhat related to PPM. A common approach is to predict the waveform using linear predictive coding (LPC), and then entropy-code the residuals. FLAC does that, for example.
Are there compression techniques that are only "mostly" lossless? I was thinking something along the lines of: for delta, N > 0 the probability that the compression & decompression of a random stream of N bytes will result in loss is less than delta?