This is very interesting!<p>I've tried something somewhat similar in the past. I was looking at implementing an extremely fast decompressor, with ratio similar to LZ4. I was able to get 2x the decompression speed of LZ4, but struggled with compression ratio. The idea was to have 16 byte matches, and allow the matches to apply a 16-bit mask, telling whether each byte is part of the match or a literal. Then I restricted the compressor to only be able to use 16 distinct masks.<p>This was extremely fast to decompress, because each 16-byte match is: load the 16-byte match into an AVX2 register, load 16 bytes of literals, load the mask you're using, shuffle the literals, then blend the literals and the match. And because the matches are fixed size, you can start the fetch for multiple matches in parallel.<p>However, the problem I ran into, and would love to solve, is that I also wanted fast-ish compression speed. And it is very hard to search for good matches quickly. Since you have holes in the match.<p>I guess the author is looking at GPU compression, so they are taking a somewhat brute-force approach. But I'd be interested to see how they're doing the match finding, and what kind of speed they're getting.
The initial link in this piece, Compression is Compilation, was really helpful to read before reading this piece: <a href="http://richg42.blogspot.com/2015/10/compression-is-compilation.html" rel="nofollow">http://richg42.blogspot.com/2015/10/compression-is-compilati...</a>. I think I got more out of it than this piece.
The idea that a compressor is a compiler made me think back of Chaitin's work on incompleteness and the limits of mathematics, which I wrote a review of a few years ago: <a href="https://p4bl0.net/shebang/the-limits-of-mathematics.html" rel="nofollow">https://p4bl0.net/shebang/the-limits-of-mathematics.html</a>
I don't understand why the author encodes every literal byte as a separate instruction while in reality, they're just consecutive bytes?<p>The whole:<p><pre><code> LIT 13
LIT 24
LIT 65
LIT 32
...
</code></pre>
could have been written as a single instruction:<p><pre><code> LIT [13, 24, 65, 32, ...]
</code></pre>
It's almost as if author tries too hard to support their point that their variant looks better.<p>"Notice how much faster it makes progress through the file vs. LZSS"<p>Yeah, because you encode every literal separately? It's all LIT instructions?
Can a XOR-only decompressor do the standard LZ77 trick of encoding many repeated bytes with a copy operation that refers to not-yet-decompressed data? It seems to me that you'd have to have a lot of 0 patch bytes for that. Huffman coding could encode all the 0 bytes in 1 bit each, but that still loses over regular LZ77. It seems to me that you still might want to keep "COPY" around for RLE-style data.
I wonder how an LZ_ADD-compressed bitmap would compare to a PNG. PNG is basically just DEFLATE on a bitmap, but you first apply some additional encoding to each row to help the compressor, by expressing the row as a kind of difference from the previous row. It would be cool if that became unnecessary.