As general purpose compressor iguana is decompressing a lot slower than advertised, when tested with a typical data compression corpus.<p>It requires: avx512-vbmi2, available only on ice-lake/Tiger-lake/AMD zen4<p>- Benchmark from encode.su experts: <a href="https://encode.su/threads/4041-Iguana-a-fast-vectorized-compressor?p=79634&viewfull=1#post79634" rel="nofollow">https://encode.su/threads/4041-Iguana-a-fast-vectorized-comp...</a><p>- benchmark from the iguana developers here: <a href="https://github.com/SnellerInc/sneller/tree/master/cmd/iguanabench">https://github.com/SnellerInc/sneller/tree/master/cmd/iguana...</a><p>Silesia corpus / cpu Xeon Gold 5320<p>zstd -b3 3.186 943.9 MB/s<p>zstd -b9 3.574 1015.8 MB/s<p>zstd -b18 3.967 910.6 MB/s<p>lz4 -b1 2.101 3493.8 MB/s<p>lz4 -b5 2.687 3323.5 MB/s<p>lz4 -b9 2.721 3381.5 MB/s<p>iguana -t=0 2.58 4450 MB/s<p>iguana -t=1 3.11 2260 MB/s<p>As you can see, iguana with entropy coding enabled (-t 1) has a similar compression ratio to zstd -3, but it decompresses more than twice as quickly. With entropy coding disabled (-t 0), iguana has a compression ratio roughly equivalent to lz4 -5 and decompresses about 33% faster.
It looks similar to LZSSE. We tried it in ClickHouse, but then removed it:<p><a href="https://github.com/ClickHouse/ClickHouse/pull/24424">https://github.com/ClickHouse/ClickHouse/pull/24424</a><p>Reasons:
- the decompression speed is slightly better than for lz4, but the compression speed is low;
- the code was non-perfect, and the fuzzer has found issues.<p>LZSSE library was abandoned five years ago, but they have great blog posts to read: <a href="https://github.com/ConorStokes/LZSSE">https://github.com/ConorStokes/LZSSE</a><p>Iguana looks promising, but AVX-512 requirement is too restrictive. We need something to work both on x86 and ARM. Also, integrating Go assembly into other software is not easy. And A-GPL license makes it incompatible.
technically this looks really impressive. great to see a new compression approach that supports extremely high performance decompression, with a high performance open source implementation.<p>re: winning adoption of new compression approaches, there's an interesting podcast interview [1] with Yann Collet (of lz4 / zstd):<p>Some factors Yann discussed that helped lz4 & zstd gain traction were: permissive licensing (BSD) ; implementation in C -- widest support for including it into other software ecosystems ; open development & paying attention to issues raised by users of the software ; the new compression approach able beat an existing popular approach in some use cases with no downside: e.g. if a hypothetical new compression approach has 200% faster decompression but offers 10% worse compression ratios, then there's friction for introducing it into an existing system, as the new approach might first require purchase and deployment of additional storage. Whereas a new approach that is 50% faster and has exactly the same or slightly better compression ratios can be adopted with much less friction.<p>It looks like the Iguana code has recently been relicensed with Apache instead of AGPL (which used for the rest of the sneller repo), which could lower the barrier for other projects to consider adopting Iguana, although there are still dependencies from the Iguana code to code in AGPL licensed files elsewhere in the sneller repo.<p>[1] <a href="https://corecursive.com/data-compression-yann-collet/" rel="nofollow">https://corecursive.com/data-compression-yann-collet/</a>
Thank you, Promising work!<p>Question: How was zstd built in the tests?<p>In other words, was the possibility of 2-stage pgo+lto optimization taken into account?<p>(the alpine zstd claim ~ "+30% faster on x86_64 than the default makefile" [1] )<p>[1]
<i>""<p># 2-stage pgo+lto build (non-bootstrap), standard meson usage.<p># note that with clang,<p># llvm-profdata merge --output=output/somefilename(?) output/</i>.profraw<p># is needed.<p># believe it or not, this is +30% faster on x86_64 than the default makefile build (same params)..<p># maybe needs more testing<p># shellcheck disable=2046<p>""*<p><a href="https://github.com/alpinelinux/aports/blob/master/main/zstd/APKBUILD">https://github.com/alpinelinux/aports/blob/master/main/zstd/...</a>
Most time-series / analytical databases,... are already using or switching to integer-compression [1] where you can compress/decompress several times (>100GB/s see TurboBitByte in [2]) faster than general purpose compressors.<p>[1] - <a href="https://github.com/powturbo/TurboPFor-Integer-Compression">https://github.com/powturbo/TurboPFor-Integer-Compression</a><p>[2] - <a href="https://github.com/powturbo/TurboPFor-Integer-Compression/issues/96">https://github.com/powturbo/TurboPFor-Integer-Compression/is...</a>
Is this effected by Microsoft's patent on various rAns coding and decoding?<p>If not, how does it avoid the (rather vague) claims?<p><a href="https://patents.google.com/patent/US11234023B2/en" rel="nofollow">https://patents.google.com/patent/US11234023B2/en</a>
Decompression speed looks good, but in my experience once you get past a certain point (~X000 MB/s) performance gains become pretty marginal in real world applications. I'd like to see compression speeds and performance on AVX if AVX-512 is not available.