I have been waiting for this to hit 1.0 and more importantly get popular so that I can use it everywhere. I am really a fan of Yann Collet's work. These are extremely impressive work specially when you consider that lz4 seems to be better than snappy (by google) and zstandard from LZFSE (from apple). I think he is the first one to write a practical fast arithmetic coder using ANS. And look at how his huffman implementation blazes past zlib huffman though compresses less than FSE [0]. I also like reading his blog posts. While a lot of them goes over my head I can generally make a sense of what he is trying and why something's working despite the complexity.<p>[0] <a href="https://github.com/Cyan4973/FiniteStateEntropy" rel="nofollow">https://github.com/Cyan4973/FiniteStateEntropy</a>
There is just so much awesome stuff in this article. Finite State Entropy and Asymmetric Numeral System are completely new concepts to me (I've got 7 open tabs just from references FB supplied in the article), as is repcode modeling. I love that they've already built in granular control over the compression tradeoffs you can make, and I can't wait to look into Huff0. If anyone outside of Facebook has started playing with it or is planning to put it into production right away I'd love to hear about it.
The plot of compression ratio against speed for the various compression levels is pretty helpful for understanding its performance: <a href="https://scontent.fsnc1-3.fna.fbcdn.net/t39.2365-6/14146892_944159239044397_638267599_n.jpg" rel="nofollow">https://scontent.fsnc1-3.fna.fbcdn.net/t39.2365-6/14146892_9...</a><p>"The x-axis is a decreasing logarithmic scale in megabytes per second; the y-axis is the compression ratio achieved."<p>I'd love to see a version of this chart that also included Brotli. (And I'm somewhat surprised Brotli isn't mentioned at all.)<p>(Disclaimer: I work at Google, which made Brotli)
Note: this is from the same guy who created the popular lz4 compressor, Yann Collet: <a href="http://cyan4973.github.io/lz4/" rel="nofollow">http://cyan4973.github.io/lz4/</a><p><a href="https://twitter.com/Cyan4973" rel="nofollow">https://twitter.com/Cyan4973</a>
Yann will be giving a talk on Zstandard at today's @Scale 2016 conference, and the video will be posted. He can answer the most technical questions about Zstandard, but I may be able to answer some as well; we both work on compression at Facebook.
The modern trend of compressors is to use more memory to achieve speed. This is good if you're using big-iron cloud computers...<p><i>"Zstandard has no inherent limit and can address terabytes of memory (although it rarely does). For example, the lower of the 22 levels use 1 MB or less. For compatibility with a broad range of receiving systems, where memory may be limited, it is recommended to limit memory usage to 8 MB. This is a tuning recommendation, though, not a compression format limitation."</i><p><i>8MB</i> for the smallest preset? Back in the mid-2000s, I was attending a Jabber/XMPP discussion, about the viability of using libz for compressing the stream. It turned out that even just a <i>32kb</i> window is <i>huge</i> when your connection server is handling thousands of connections at a time, and they were investigating the effect of using a modified libz with an even smaller window (it was hard-coded, back then).<p>I know Moore's law is in ZStandard's favor w.r.t. memory usage (what's 8MB when your server's got 64GB or more?), but I think it's useful to note that this is squarely aimed at web traffic backed by beefy servers.
I'm a complete dunce when it comes to compression and how it fits in the industry, so help me out here. Say that everyone accepts that Zstandard is amazing and we should start using it. What would the adoption process look like? I understand individual programs could implement it since they would handle both compression and decompression, but what about the web?<p>Would HTTP servers first have to add support, then browser vendors would follow?
Some more benchmarks on this[0] page<p>Also, I actually discovered something <i>very</i> interesting (to me at least). At the bottom of the link mentioned below, the link attached says <a href="https://github.com/Cyan4973/zstd" rel="nofollow">https://github.com/Cyan4973/zstd</a> but then redirects to <a href="https://github.com/facebook/zstd" rel="nofollow">https://github.com/facebook/zstd</a> . Anyone know why?<p>[0]: <a href="http://facebook.github.io/zstd/" rel="nofollow">http://facebook.github.io/zstd/</a><p>EDIT: After a little bit of sleuthing, it looks like the author of zstd (github.com/Cyan4973) is now contributing[1] to github.com/facebook/zstd<p>And the page layout for lz4[2] looks the same as zstd[0]<p>Anyone know if Yann Collet works for/with facebook on things other than zstd?<p>EDIT 2: In the time it took me to google a couple things, looks like the children comments have already answered my questions.<p>Also, previous discussions on zstd (not that its completely relevant) -<a href="https://news.ycombinator.com/item?id=8941955" rel="nofollow">https://news.ycombinator.com/item?id=8941955</a>
<a href="https://www.reddit.com/r/programming/comments/2tibrh/zstd_a_new_compression_algorithm/" rel="nofollow">https://www.reddit.com/r/programming/comments/2tibrh/zstd_a_...</a><p>[1]:<a href="https://github.com/facebook/zstd/pull/312" rel="nofollow">https://github.com/facebook/zstd/pull/312</a>
[2]: <a href="http://cyan4973.github.io/lz4/" rel="nofollow">http://cyan4973.github.io/lz4/</a>
The goals sound similar to Apple's LZFSE (see <a href="https://github.com/lzfse/lzfse" rel="nofollow">https://github.com/lzfse/lzfse</a> for more). Any comparison out there?
From the bits of testing I've done today, it's phenomenally fast on x86. Much better than gzip (and pigz for that matter) in every metric I think I generally care about: CPU Usage, Compression Speed, Decompression Speed, Compression Ratio.<p>On other architecture the picture gets a bit murky, it seems to get handily beaten by pigz through what at first blush I'd guess is just sheer parallelism. It's got solid performance, and without a shadow of doubt faster than vanilla gzip. If/as/when I get time, it'll be interesting to dig into why performance is worse there.
This is an awesome blog post that is very well written, but the lack of incompressible performance analysis prevents It from providing a complete overview of zstd.<p>Incompressible performance measurements are important for interactive/realtime workloads and the numbers are extremely interesting because they can differ dramatically from the average case measurements. LZ4 for instance has been measured at doing 10GB/sec on incompressible data on a single core of a modern Intel Xeon processor. At the other end of the spectrum is the worst case scenario for incompressible data where performance slows to a crawl. I do not recall any examples in this area, but the point is that it is possible for algorithms to have great average case performance and terrible worst case performance. Quick sort is probably the most famous example of that concept.<p>I have no reason to suspect that zstd has bad incompressible performance, but the omission of incompressible performance numbers is unfortunate.
A recent compression discussion I saw involved how do compressors fare on uncompressible input? For example, suppose you wanted to add compression to all your outbound network traffic. What would happen if there was mixed compressible traffic along with the uncomressible kind? A common case would be sending HTML along with JPEG.<p>Good compressors can't squeeze any more out of a JPEG, but they can back off fast and go faster. Snappy was designed to do this, and even implementations of gzip do it too. It greatly reduces the fear of CPU overhead to always on compression. I wonder how Zstd handles such cases?<p>*Ignoring security altogether
>> "It is written in highly portable C, making it suitable for practically every platform used today"<p>I love C, it is not the enemy everyone makes it out to be.<p>It's already in debian: <a href="https://packages.debian.org/stretch/zstd" rel="nofollow">https://packages.debian.org/stretch/zstd</a> and judging by the small requirements,it is portable indeed.
Quick benchmark on a 194MiB SQL dump:<p><pre><code> gzip -9: 27.574s, 48MiB output
zstd -9: 14.182s, 41MiB output
</code></pre>
Thanks, I'll gladly use zstd as a drop-in replacement for my daily backups. :)
Really nice work compared to what I consider to be the quite bad Brotli -- an incredibly slow compression standard that only ended up in browsers because it was created by Google.
I think for typical JS/CSS/HTML sizes, and decompression times, probably maximum compression ratio, followed by decompression speed is what I'd look for. I don't care too much about compression speed, in the sense that if I have to spend 1 minute compressing JS to crunch it by 10%, but I serve that file a million times, then as long as decompression doesn't negate the gain in network time saved, it's a win.<p>I guess the other factor for mobile is, besides memory and decompression speed, how do various compression schemes fare battery wise?
If anyone wants to try it on windows there is a 7-zip install with support for ZSTD<p><a href="https://mcmilk.de/projects/7-Zip-zstd/" rel="nofollow">https://mcmilk.de/projects/7-Zip-zstd/</a>
If facebook hopes the new compression algorithm to be a standard, why doesn't it publish an IETF RFC draft? Will it follow OpenDNS way of dnscrypt by open-sourcing the reference implementation without publishing any IETF RFC draft?
The following link points to a fairly good benchmark / tool that showcases the tradeoffs in real life: since (de)compression takes time, what is the fastest way to transmit data at a given transfer speed?<p><a href="https://quixdb.github.io/squash-benchmark/unstable/#transfer-plus-processing" rel="nofollow">https://quixdb.github.io/squash-benchmark/unstable/#transfer...</a><p>Spoilers: zstd wins at ethernet and wifi (and is among the best in 4G), lz4 wins at hard drive encryption… both were designed by the same author.
I was looking for a windows version of zstd. On the github page, I could only get the version 0.81 version of the windows tool. Can someone release the 1.0 version of zstd for windows ?
How difficult is this new standard going to be to implement in another language? It seems highly sophisticated -- which is great, of course -- but the cost of that is relying on giants like Facebook to maintain their One True Implementation. For software this is (usually) fine; for a nee standard, it's a problem.
That's truly beautiful. Thanks, Facebook! I particularly love that you can pre-compute and reuse dictionaries, say if you're regularly compressing similar JSON objects.
turbohf claims to be 4x faster than zlib's huffman coding and 2x faster than FSE and is a generic cpu implementation. Even if claims are only partially true and turbohf is a clean dropin replacement for zlib and licensing were friendly the appeal of zstd drops substantially in my book.<p><a href="http://encode.ru/threads/2276-TurboHF-1GB-s-Huffman-Coding-Reincarnation" rel="nofollow">http://encode.ru/threads/2276-TurboHF-1GB-s-Huffman-Coding-R...</a><p><a href="https://sites.google.com/site/powturbo/entropy-coder" rel="nofollow">https://sites.google.com/site/powturbo/entropy-coder</a>
Unless they integrate it into software like web servers and web browsers it will be hard to see it really flourish as a "standard".<p>But at least within the perimeter of your own systems you can totally profit from this technology now.
Looks very interesting, however I'm not impressed by the name. "Zstandard"??? With ".zstd" as the extension? I don't like it.<p>They should have named it <i>letter</i>-zip, along the lines of gzip, bzip, and xzip, with the extension <i>letter</i>z. "fz" would have been a good one since they work at Facebook.