Zstandard – Real-time data compression algorithm

286 pointsby josephscottover 7 years ago

25 comments

tzsover 7 years ago

> Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-offThis would have been very handy about 17 years ago when I was doing a virtual CD-ROM driver, and needed to store compressed images, and the client wanted a wide range of speed/compression options for the user--much wider than zlib offered.I ended up doing a hack that was either the most disgusting thing I have ever done, or was brilliant. I have not been able to decide which.My hack: I gave them a compression/speed slider that went from 0 to 100. If the slider was set to N, I would apply zlib at maximum compression to N consecutive blocks, and then apply no compression to 100-N consecutive blocks. Repeat until the whole image is stored.The client loved it.

评论 #16228002 未加载

评论 #16228346 未加载

评论 #16229791 未加载

评论 #16229699 未加载

评论 #16228455 未加载

评论 #16229496 未加载

评论 #16228076 未加载

评论 #16229928 未加载

评论 #16229781 未加载

ComputerGuruover 7 years ago

zstd is incredible, but just in case the thought hasn't occurred to someone here that may benefit from it: if you're in control of the send and the receive, type-specific compression is hard to beat.For example, if you know you're dealing with text, you can use snappy, if you know you're dealing with images, webp, videos x264 (or x265 if you only care about decode speed and encoded size), etc and then fall back to zstd only when you don't have a specific compressor for the chosen file type.

评论 #16227935 未加载

评论 #16229028 未加载

评论 #16227970 未加载

评论 #16230244 未加载

评论 #16228277 未加载

评论 #16227927 未加载

评论 #16227859 未加载

beagle3over 7 years ago

I used to object to using zstd based on Facebook's onerous license and patent policy; but now that zstd is plain BSD+GPLv2, I endorse it.

terrellnover 7 years ago

Zstd recently added a long range mode that can find matches up to 2 GB in the past using a specialized algorithm [1]. It can be enabled on the command line with `zstd --long` for a 128 MB window, or `--long=windowLog` for a `2^windowLog` byte window.[1] <a href="https://github.com/facebook/zstd/releases/tag/v1.3.2" rel="nofollow">https://github.com/facebook/zstd/releases/tag/v1.3.2</a>

评论 #16230361 未加载

评论 #16228597 未加载

loegover 7 years ago

We added zstd to largely supersede gzip use at work. It performs better than oldschool gzip in both compression and speed.That said, I'm not sure I'd call it a "real-time" compression algorithm. It's still an factor of 2x slower than lz4 compress and a factor of 4x slower decompress.

评论 #16227581 未加载

评论 #16228017 未加载

waterhouseover 7 years ago

A curious fact I encountered: First, the zstdcat executable that ships with zstd (actually a symlink to zstd itself, roughly equivalent to "zstd -cqdf") can decompress .gz files. Second, zstd is faster than gzip at decompressing .gz files, taking about 2/3 as long.I'm not kidding. I couldn't believe it myself, but subsequent testing stubbornly bore it out—on one file that was 15 MB after compression, and on a mix of smaller files. I tried compiling gzip from source, using the same compiler I used for zstd, and the results were the same. strace seemed to show zstd read and wrote in chunks 2x the size, but the number of syscalls didn't seem to be nearly enough to explain the difference. It seems to have this "zlibWrapper" subdirectory; its README has some benchmark numbers that I don't fully understand, but some of them seem to match the 2/3 factor I got.I'm wondering if this is a clever tactic to drive adoption—get people to use the zstd executable even when they're still using gzipped files. ;-)Also, the fact that it has support for reading (and, apparently, writing) gzip, lz4, and xz on top of its own format really makes "z standard" an appropriate name.

评论 #16230453 未加载

cmurfover 7 years ago

Supported on Btrfs since kernel 4.14. Like other compression options, it's a mount time option.Edit: Also in squashfs. Here's the git pull request which includes some benchmarks. <a href="https://lkml.org/lkml/2017/9/11/369" rel="nofollow">https://lkml.org/lkml/2017/9/11/369</a>

评论 #16228783 未加载

desertrider12over 7 years ago

I wonder how the RAD Game Tools compressors would fit into that benchmark list. In their benchmarks Oodle Selkie has roughly the same ratio as Zlib but is 3-4x faster at decompression than Zstd (not the same benchmark though). <a href="http://www.radgametools.com/oodle.htm" rel="nofollow">http://www.radgametools.com/oodle.htm</a>

评论 #16230137 未加载

jdhawkover 7 years ago

We've been using zstd for a while in production, switching over from rar/gzip/7z for archiving huge XML files. It's speed to compression ratio is really impressive. Hats off to the development team.

cjhanksover 7 years ago

I use the C bindings for ZStandard, it performs very nicely as an algorithm. I wish the documentation was a little easier to use. I am still not sure if the provided examples really properly handle the degenerate cases of /dev/urandom and /dev/zero when using `ZSTD_DStream{In,Out}Size` and friends.In any case - thanks for releasing this, it's been very helpful to me.

bajsejohannesover 7 years ago

Some good discussion in earlier postings too: <a href="https://hn.algolia.com/?query=zstandard&sort=byPopularity&prefix&page=0&dateRange=all&type=story" rel="nofollow">https://hn.algolia.com/?query=zstandard&sort=byPopularity&pr...</a>

reacharavindhover 7 years ago

Anyone using zstd along with an incremental rsync approach? Like gzip --rsyncable ?Say I have 1000 files. I want to compress them and let the cron rsync do it's thing. Next day, if only one file had changed, rsync should pickup only the differential instead of the whole archive.Or is there a better way of doing it?

cakooseover 7 years ago

In the "Compression Speed vs Ratio" graph, the Y-axis doesn't start at zero. If it were changed to start at zero, it would be easier to evaluate the magnitude of the compression ratio change at a glance. IMO, that's probably worth the extra whitespace in the graph.

revelationover 7 years ago

From that benchmark list I'd prefer lz4..

评论 #16228185 未加载

lars_franckeover 7 years ago

Last time this was discussed Zstandard didn't have a splittable mode and by the looks of it[1] they still don't. That doesn't make it a bad algorithm it just means that it's not a good choice yet for Hadoop et al. As far as I know no container format has implemented Zstandard yet.Does anyone know any better? It seems like we could use a better alternative to Snappy.[1] <a href="https://github.com/facebook/zstd/issues/395" rel="nofollow">https://github.com/facebook/zstd/issues/395</a>

评论 #16229692 未加载

jakozaurover 7 years ago

In a lot of big data workflows, zstd is no brainer replacement of gzip. Always faster decompression, plus you can save tons of storage and data transfer.

jokoonover 7 years ago

When one talks about general data compression, isn't that relevant that it generally applies to text compression, and not video or audio?I guess general data compression works on audio and video, but most of the time you either choose to compress text, audio, video or you create a file format that indexes your data.

teejover 7 years ago

We switched many of our Redshift compression encodings to zstd. Works fantastically on text and JSON.

karmicthreatover 7 years ago

Not really totally this topic, but I deal with a bunch of 8 Gig images of device firmwares. Is there anything that can make compressing these and de-duplicating these images fast? I have used borg in the past, but archive operations are just seem slow.

评论 #16231924 未加载

unixheroover 7 years ago

So like 'middle out'

powturboover 7 years ago

see Compression Benchmark at (<a href="https://github.com/powturbo/TurboBench" rel="nofollow">https://github.com/powturbo/TurboBench</a>)

2pointsomoneover 7 years ago

That's the best Weissman score I have ever seen.

评论 #16228572 未加载

natbobcover 7 years ago

Why is this site an SPA?!?!?

评论 #16228803 未加载

评论 #16229118 未加载

DyslexicAtheistover 7 years ago

TL;DR: Facebook acquires "Pied Piper" to gain exclusive rights on the Mean-Jerk-Time (MJT) compression patent; renames it to Zstandard for political correctness.

gateway7over 7 years ago

That patent clause though. Stick to the fork from before Facebook bought the man and the algorithm.

评论 #16228358 未加载