Smaller and faster data compression with Zstandard

819 pointsby jamesgpearceover 8 years ago

40 comments

tmd83over 8 years ago

I have been waiting for this to hit 1.0 and more importantly get popular so that I can use it everywhere. I am really a fan of Yann Collet's work. These are extremely impressive work specially when you consider that lz4 seems to be better than snappy (by google) and zstandard from LZFSE (from apple). I think he is the first one to write a practical fast arithmetic coder using ANS. And look at how his huffman implementation blazes past zlib huffman though compresses less than FSE [0]. I also like reading his blog posts. While a lot of them goes over my head I can generally make a sense of what he is trying and why something's working despite the complexity.[0] <a href="https://github.com/Cyan4973/FiniteStateEntropy" rel="nofollow">https://github.com/Cyan4973/FiniteStateEntropy</a>

评论 #12401504 未加载

评论 #12403029 未加载

评论 #12400540 未加载

levbrieover 8 years ago

There is just so much awesome stuff in this article. Finite State Entropy and Asymmetric Numeral System are completely new concepts to me (I've got 7 open tabs just from references FB supplied in the article), as is repcode modeling. I love that they've already built in granular control over the compression tradeoffs you can make, and I can't wait to look into Huff0. If anyone outside of Facebook has started playing with it or is planning to put it into production right away I'd love to hear about it.

评论 #12400342 未加载

评论 #12400326 未加载

cbrover 8 years ago

The plot of compression ratio against speed for the various compression levels is pretty helpful for understanding its performance: <a href="https://scontent.fsnc1-3.fna.fbcdn.net/t39.2365-6/14146892_944159239044397_638267599_n.jpg" rel="nofollow">https://scontent.fsnc1-3.fna.fbcdn.net/t39.2365-6/14146892_9...</a>"The x-axis is a decreasing logarithmic scale in megabytes per second; the y-axis is the compression ratio achieved."I'd love to see a version of this chart that also included Brotli. (And I'm somewhat surprised Brotli isn't mentioned at all.)(Disclaimer: I work at Google, which made Brotli)

评论 #12400503 未加载

评论 #12400351 未加载

评论 #12400314 未加载

评论 #12400285 未加载

评论 #12402286 未加载

评论 #12405750 未加载

AceJohnny2over 8 years ago

Note: this is from the same guy who created the popular lz4 compressor, Yann Collet: <a href="http://cyan4973.github.io/lz4/" rel="nofollow">http://cyan4973.github.io/lz4/</a><a href="https://twitter.com/Cyan4973" rel="nofollow">https://twitter.com/Cyan4973</a>

cturover 8 years ago

Yann will be giving a talk on Zstandard at today's @Scale 2016 conference, and the video will be posted. He can answer the most technical questions about Zstandard, but I may be able to answer some as well; we both work on compression at Facebook.

评论 #12400401 未加载

评论 #12404186 未加载

评论 #12404187 未加载

AceJohnny2over 8 years ago

The modern trend of compressors is to use more memory to achieve speed. This is good if you're using big-iron cloud computers..."Zstandard has no inherent limit and can address terabytes of memory (although it rarely does). For example, the lower of the 22 levels use 1 MB or less. For compatibility with a broad range of receiving systems, where memory may be limited, it is recommended to limit memory usage to 8 MB. This is a tuning recommendation, though, not a compression format limitation."8MB for the smallest preset? Back in the mid-2000s, I was attending a Jabber/XMPP discussion, about the viability of using libz for compressing the stream. It turned out that even just a 32kb window is huge when your connection server is handling thousands of connections at a time, and they were investigating the effect of using a modified libz with an even smaller window (it was hard-coded, back then).I know Moore's law is in ZStandard's favor w.r.t. memory usage (what's 8MB when your server's got 64GB or more?), but I think it's useful to note that this is squarely aimed at web traffic backed by beefy servers.

评论 #12400731 未加载

评论 #12400542 未加载

评论 #12402724 未加载

评论 #12401426 未加载

评论 #12400529 未加载

评论 #12400512 未加载

评论 #12401470 未加载

ohitsdomover 8 years ago

I'm a complete dunce when it comes to compression and how it fits in the industry, so help me out here. Say that everyone accepts that Zstandard is amazing and we should start using it. What would the adoption process look like? I understand individual programs could implement it since they would handle both compression and decompression, but what about the web?Would HTTP servers first have to add support, then browser vendors would follow?

评论 #12400653 未加载

kttaover 8 years ago

Some more benchmarks on this[0] pageAlso, I actually discovered something very interesting (to me at least). At the bottom of the link mentioned below, the link attached says <a href="https://github.com/Cyan4973/zstd" rel="nofollow">https://github.com/Cyan4973/zstd</a> but then redirects to <a href="https://github.com/facebook/zstd" rel="nofollow">https://github.com/facebook/zstd</a> . Anyone know why?[0]: <a href="http://facebook.github.io/zstd/" rel="nofollow">http://facebook.github.io/zstd/</a>EDIT: After a little bit of sleuthing, it looks like the author of zstd (github.com/Cyan4973) is now contributing[1] to github.com/facebook/zstdAnd the page layout for lz4[2] looks the same as zstd[0]Anyone know if Yann Collet works for/with facebook on things other than zstd?EDIT 2: In the time it took me to google a couple things, looks like the children comments have already answered my questions.Also, previous discussions on zstd (not that its completely relevant) -<a href="https://news.ycombinator.com/item?id=8941955" rel="nofollow">https://news.ycombinator.com/item?id=8941955</a> <a href="https://www.reddit.com/r/programming/comments/2tibrh/zstd_a_new_compression_algorithm/" rel="nofollow">https://www.reddit.com/r/programming/comments/2tibrh/zstd_a_...</a>[1]:<a href="https://github.com/facebook/zstd/pull/312" rel="nofollow">https://github.com/facebook/zstd/pull/312</a> [2]: <a href="http://cyan4973.github.io/lz4/" rel="nofollow">http://cyan4973.github.io/lz4/</a>

评论 #12400058 未加载

评论 #12400056 未加载

评论 #12400050 未加载

评论 #12400059 未加载

评论 #12400082 未加载

markonenover 8 years ago

The goals sound similar to Apple's LZFSE (see <a href="https://github.com/lzfse/lzfse" rel="nofollow">https://github.com/lzfse/lzfse</a> for more). Any comparison out there?

评论 #12400484 未加载

评论 #12400391 未加载

Twirrimover 8 years ago

From the bits of testing I've done today, it's phenomenally fast on x86. Much better than gzip (and pigz for that matter) in every metric I think I generally care about: CPU Usage, Compression Speed, Decompression Speed, Compression Ratio.On other architecture the picture gets a bit murky, it seems to get handily beaten by pigz through what at first blush I'd guess is just sheer parallelism. It's got solid performance, and without a shadow of doubt faster than vanilla gzip. If/as/when I get time, it'll be interesting to dig into why performance is worse there.

评论 #12406004 未加载

ryaoover 8 years ago

This is an awesome blog post that is very well written, but the lack of incompressible performance analysis prevents It from providing a complete overview of zstd.Incompressible performance measurements are important for interactive/realtime workloads and the numbers are extremely interesting because they can differ dramatically from the average case measurements. LZ4 for instance has been measured at doing 10GB/sec on incompressible data on a single core of a modern Intel Xeon processor. At the other end of the spectrum is the worst case scenario for incompressible data where performance slows to a crawl. I do not recall any examples in this area, but the point is that it is possible for algorithms to have great average case performance and terrible worst case performance. Quick sort is probably the most famous example of that concept.I have no reason to suspect that zstd has bad incompressible performance, but the omission of incompressible performance numbers is unfortunate.

评论 #12402202 未加载

morecoffeeover 8 years ago

A recent compression discussion I saw involved how do compressors fare on uncompressible input? For example, suppose you wanted to add compression to all your outbound network traffic. What would happen if there was mixed compressible traffic along with the uncomressible kind? A common case would be sending HTML along with JPEG.Good compressors can't squeeze any more out of a JPEG, but they can back off fast and go faster. Snappy was designed to do this, and even implementations of gzip do it too. It greatly reduces the fear of CPU overhead to always on compression. I wonder how Zstd handles such cases?*Ignoring security altogether

评论 #12405004 未加载

评论 #12404236 未加载

esaymover 8 years ago

>> "It is written in highly portable C, making it suitable for practically every platform used today"I love C, it is not the enemy everyone makes it out to be.It's already in debian: <a href="https://packages.debian.org/stretch/zstd" rel="nofollow">https://packages.debian.org/stretch/zstd</a> and judging by the small requirements,it is portable indeed.

xrstfover 8 years ago

Quick benchmark on a 194MiB SQL dump:<pre><code> gzip -9: 27.574s, 48MiB output zstd -9: 14.182s, 41MiB output </code></pre> Thanks, I'll gladly use zstd as a drop-in replacement for my daily backups. :)

tambourine_manover 8 years ago

Isn't it a bit presumptuous to call your own thing "standard"?

评论 #12402025 未加载

mrmrbenover 8 years ago

Really nice work compared to what I consider to be the quite bad Brotli -- an incredibly slow compression standard that only ended up in browsers because it was created by Google.

评论 #12401630 未加载

cromwellianover 8 years ago

I think for typical JS/CSS/HTML sizes, and decompression times, probably maximum compression ratio, followed by decompression speed is what I'd look for. I don't care too much about compression speed, in the sense that if I have to spend 1 minute compressing JS to crunch it by 10%, but I serve that file a million times, then as long as decompression doesn't negate the gain in network time saved, it's a win.I guess the other factor for mobile is, besides memory and decompression speed, how do various compression schemes fare battery wise?

评论 #12401592 未加载

erichoceanover 8 years ago

For this to become anything like a standard, Facebook would have to remove its patent poison pill.

yreadover 8 years ago

If anyone wants to try it on windows there is a 7-zip install with support for ZSTD<a href="https://mcmilk.de/projects/7-Zip-zstd/" rel="nofollow">https://mcmilk.de/projects/7-Zip-zstd/</a>

mydeardiaryover 8 years ago

If facebook hopes the new compression algorithm to be a standard, why doesn't it publish an IETF RFC draft? Will it follow OpenDNS way of dnscrypt by open-sourcing the reference implementation without publishing any IETF RFC draft?

espadrineover 8 years ago

The following link points to a fairly good benchmark / tool that showcases the tradeoffs in real life: since (de)compression takes time, what is the fastest way to transmit data at a given transfer speed?<a href="https://quixdb.github.io/squash-benchmark/unstable/#transfer-plus-processing" rel="nofollow">https://quixdb.github.io/squash-benchmark/unstable/#transfer...</a>Spoilers: zstd wins at ethernet and wifi (and is among the best in 4G), lz4 wins at hard drive encryption… both were designed by the same author.

评论 #12401119 未加载

anjanbover 8 years ago

I was looking for a windows version of zstd. On the github page, I could only get the version 0.81 version of the windows tool. Can someone release the 1.0 version of zstd for windows ?

nemo1618over 8 years ago

How difficult is this new standard going to be to implement in another language? It seems highly sophisticated -- which is great, of course -- but the cost of that is relying on giants like Facebook to maintain their One True Implementation. For software this is (usually) fine; for a nee standard, it's a problem.

评论 #12400283 未加载

评论 #12400616 未加载

kstrauserover 8 years ago

That's truly beautiful. Thanks, Facebook! I particularly love that you can pre-compute and reuse dictionaries, say if you're regularly compressing similar JSON objects.

mana_12over 8 years ago

Zstandard is both a command line tool (zstd) and a library.

philplckthunover 8 years ago

I'd love HTTP content encoding to support this and see a comparison to Brotli. Looks like it might be yet another good alternative to gzip.

mozumderover 8 years ago

Is this being pushed as being standard as part of the HTTP spec, seeing that it comes from Facebook?

评论 #12400570 未加载

bnolsenover 8 years ago

turbohf claims to be 4x faster than zlib's huffman coding and 2x faster than FSE and is a generic cpu implementation. Even if claims are only partially true and turbohf is a clean dropin replacement for zlib and licensing were friendly the appeal of zstd drops substantially in my book.<a href="http://encode.ru/threads/2276-TurboHF-1GB-s-Huffman-Coding-Reincarnation" rel="nofollow">http://encode.ru/threads/2276-TurboHF-1GB-s-Huffman-Coding-R...</a><a href="https://sites.google.com/site/powturbo/entropy-coder" rel="nofollow">https://sites.google.com/site/powturbo/entropy-coder</a>

partycoderover 8 years ago

Unless they integrate it into software like web servers and web browsers it will be hard to see it really flourish as a "standard".But at least within the perimeter of your own systems you can totally profit from this technology now.

faragonover 8 years ago

Beautiful.

z3t4over 8 years ago

Tables should be in HTML.

Grishnakhover 8 years ago

Looks very interesting, however I'm not impressed by the name. "Zstandard"??? With ".zstd" as the extension? I don't like it.They should have named it letter-zip, along the lines of gzip, bzip, and xzip, with the extension letterz. "fz" would have been a good one since they work at Facebook.

评论 #12400166 未加载

评论 #12400173 未加载

评论 #12400210 未加载

评论 #12400157 未加载

ilostmykeysover 8 years ago

How does this compete with PiedPiper?

评论 #12402867 未加载

cristiandanover 8 years ago

Awesome

bananaoomarangover 8 years ago

Best middle-out in the game.

lasryaricover 8 years ago

Whats their weissman score?

评论 #12401728 未加载

f137over 8 years ago

Probably nictpicking but "Smaller data compression" makes no sense really

kaushalp88over 8 years ago

Should we start with the pied piper jokes now or later?

评论 #12402864 未加载

DJ_Icebearover 8 years ago

Should've named it "Pied Piper"

DJ_Icebearover 8 years ago

They should've named it "Pied Piper".

40 comments

tmd83over 8 years ago

评论 #12401504 未加载

评论 #12403029 未加载

评论 #12400540 未加载

levbrieover 8 years ago

评论 #12400342 未加载

评论 #12400326 未加载

cbrover 8 years ago

评论 #12400503 未加载

评论 #12400351 未加载

评论 #12400314 未加载

评论 #12400285 未加载

评论 #12402286 未加载

评论 #12405750 未加载

AceJohnny2over 8 years ago

cturover 8 years ago

评论 #12400401 未加载

评论 #12404186 未加载

评论 #12404187 未加载

AceJohnny2over 8 years ago

评论 #12400731 未加载

评论 #12400542 未加载

评论 #12402724 未加载

评论 #12401426 未加载

评论 #12400529 未加载

评论 #12400512 未加载

评论 #12401470 未加载

ohitsdomover 8 years ago

评论 #12400653 未加载

kttaover 8 years ago

评论 #12400058 未加载

评论 #12400056 未加载

评论 #12400050 未加载

评论 #12400059 未加载

评论 #12400082 未加载

markonenover 8 years ago

The goals sound similar to Apple's LZFSE (see <a href="https://github.com/lzfse/lzfse" rel="nofollow">https://github.com/lzfse/lzfse</a> for more). Any comparison out there?

评论 #12400484 未加载

评论 #12400391 未加载

Twirrimover 8 years ago

评论 #12406004 未加载

ryaoover 8 years ago

评论 #12402202 未加载

morecoffeeover 8 years ago

评论 #12405004 未加载

评论 #12404236 未加载

esaymover 8 years ago

xrstfover 8 years ago

tambourine_manover 8 years ago

Isn't it a bit presumptuous to call your own thing "standard"?

评论 #12402025 未加载

mrmrbenover 8 years ago

Really nice work compared to what I consider to be the quite bad Brotli -- an incredibly slow compression standard that only ended up in browsers because it was created by Google.

评论 #12401630 未加载

cromwellianover 8 years ago

评论 #12401592 未加载

erichoceanover 8 years ago

For this to become anything like a standard, Facebook would have to remove its patent poison pill.

yreadover 8 years ago

If anyone wants to try it on windows there is a 7-zip install with support for ZSTD<a href="https://mcmilk.de/projects/7-Zip-zstd/" rel="nofollow">https://mcmilk.de/projects/7-Zip-zstd/</a>

mydeardiaryover 8 years ago

espadrineover 8 years ago

评论 #12401119 未加载

anjanbover 8 years ago

I was looking for a windows version of zstd. On the github page, I could only get the version 0.81 version of the windows tool. Can someone release the 1.0 version of zstd for windows ?

nemo1618over 8 years ago

评论 #12400283 未加载

评论 #12400616 未加载

kstrauserover 8 years ago

That's truly beautiful. Thanks, Facebook! I particularly love that you can pre-compute and reuse dictionaries, say if you're regularly compressing similar JSON objects.

mana_12over 8 years ago

Zstandard is both a command line tool (zstd) and a library.

philplckthunover 8 years ago

I'd love HTTP content encoding to support this and see a comparison to Brotli. Looks like it might be yet another good alternative to gzip.

mozumderover 8 years ago

Is this being pushed as being standard as part of the HTTP spec, seeing that it comes from Facebook?

评论 #12400570 未加载

bnolsenover 8 years ago

partycoderover 8 years ago

faragonover 8 years ago

Beautiful.

z3t4over 8 years ago

Tables should be in HTML.

Grishnakhover 8 years ago

评论 #12400166 未加载

评论 #12400173 未加载

评论 #12400210 未加载

评论 #12400157 未加载

ilostmykeysover 8 years ago

How does this compete with PiedPiper?

评论 #12402867 未加载

cristiandanover 8 years ago

Awesome

bananaoomarangover 8 years ago

Best middle-out in the game.

lasryaricover 8 years ago

Whats their weissman score?

评论 #12401728 未加载

f137over 8 years ago

Probably nictpicking but "Smaller data compression" makes no sense really

kaushalp88over 8 years ago

Should we start with the pied piper jokes now or later?

评论 #12402864 未加载

DJ_Icebearover 8 years ago

Should've named it "Pied Piper"

DJ_Icebearover 8 years ago

They should've named it "Pied Piper".