> Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-off<p>This would have been very handy about 17 years ago when I was doing a virtual CD-ROM driver, and needed to store compressed images, and the client wanted a wide range of speed/compression options for the user--much wider than zlib offered.<p>I ended up doing a hack that was either the most disgusting thing I have ever done, or was brilliant. I have not been able to decide which.<p>My hack: I gave them a compression/speed slider that went from 0 to 100. If the slider was set to N, I would apply zlib at maximum compression to N consecutive blocks, and then apply no compression to 100-N consecutive blocks. Repeat until the whole image is stored.<p>The client loved it.
zstd is incredible, but just in case the thought hasn't occurred to someone here that may benefit from it: if you're in control of the send and the receive, type-specific compression is hard to beat.<p>For example, if you know you're dealing with text, you can use snappy, if you know you're dealing with images, webp, videos x264 (or x265 if you only care about decode speed and encoded size), etc and then fall back to zstd only when you don't have a specific compressor for the chosen file type.
Zstd recently added a long range mode that can find matches up to 2 GB in the past using a specialized algorithm [1]. It can be enabled on the command line with `zstd --long` for a 128 MB window, or `--long=windowLog` for a `2^windowLog` byte window.<p>[1] <a href="https://github.com/facebook/zstd/releases/tag/v1.3.2" rel="nofollow">https://github.com/facebook/zstd/releases/tag/v1.3.2</a>
We added zstd to largely supersede gzip use at work. It performs better than oldschool gzip in both compression and speed.<p>That said, I'm not sure I'd call it a "real-time" compression algorithm. It's still an factor of 2x slower than lz4 compress and a factor of 4x slower decompress.
A curious fact I encountered: First, the zstdcat executable that ships with zstd (actually a symlink to zstd itself, roughly equivalent to "zstd -cqdf") can decompress .gz files. Second, zstd is <i>faster</i> than gzip at decompressing .gz files, taking about 2/3 as long.<p>I'm not kidding. I couldn't believe it myself, but subsequent testing stubbornly bore it out—on one file that was 15 MB after compression, and on a mix of smaller files. I tried compiling gzip from source, using the same compiler I used for zstd, and the results were the same. strace seemed to show zstd read and wrote in chunks 2x the size, but the number of syscalls didn't seem to be nearly enough to explain the difference. It seems to have this "zlibWrapper" subdirectory; its README has some benchmark numbers that I don't fully understand, but some of them seem to match the 2/3 factor I got.<p>I'm wondering if this is a clever tactic to drive adoption—get people to use the zstd executable even when they're still using gzipped files. ;-)<p>Also, the fact that it has support for reading (and, apparently, writing) gzip, lz4, and xz on top of its own format really makes "z standard" an appropriate name.
Supported on Btrfs since kernel 4.14. Like other compression options, it's a mount time option.<p>Edit: Also in squashfs. Here's the git pull request which includes some benchmarks.
<a href="https://lkml.org/lkml/2017/9/11/369" rel="nofollow">https://lkml.org/lkml/2017/9/11/369</a>
I wonder how the RAD Game Tools compressors would fit into that benchmark list. In their benchmarks Oodle Selkie has roughly the same ratio as Zlib but is 3-4x faster at decompression than Zstd (not the same benchmark though). <a href="http://www.radgametools.com/oodle.htm" rel="nofollow">http://www.radgametools.com/oodle.htm</a>
We've been using zstd for a while in production, switching over from rar/gzip/7z for archiving huge XML files. It's speed to compression ratio is really impressive. Hats off to the development team.
I use the C bindings for ZStandard, it performs very nicely as an algorithm. I wish the documentation was a little easier to use. I am still not sure if the provided examples really properly handle the degenerate cases of /dev/urandom and /dev/zero when using `ZSTD_DStream{In,Out}Size` and friends.<p>In any case - thanks for releasing this, it's been very helpful to me.
Some good discussion in earlier postings too: <a href="https://hn.algolia.com/?query=zstandard&sort=byPopularity&prefix&page=0&dateRange=all&type=story" rel="nofollow">https://hn.algolia.com/?query=zstandard&sort=byPopularity&pr...</a>
Anyone using zstd along with an incremental rsync approach?
Like gzip --rsyncable ?<p>Say I have 1000 files. I want to compress them and let the cron rsync do it's thing.
Next day, if only one file had changed, rsync should pickup only the differential instead of the whole archive.<p>Or is there a better way of doing it?
In the "Compression Speed vs Ratio" graph, the Y-axis doesn't start at zero. If it were changed to start at zero, it would be easier to evaluate the magnitude of the compression ratio change at a glance. IMO, that's probably worth the extra whitespace in the graph.
Last time this was discussed Zstandard didn't have a splittable mode and by the looks of it[1] they still don't. That doesn't make it a bad algorithm it just means that it's not a good choice yet for Hadoop et al. As far as I know no container format has implemented Zstandard yet.<p>Does anyone know any better? It seems like we could use a better alternative to Snappy.<p>[1] <a href="https://github.com/facebook/zstd/issues/395" rel="nofollow">https://github.com/facebook/zstd/issues/395</a>
In a lot of big data workflows, zstd is no brainer replacement of gzip. Always faster decompression, plus you can save tons of storage and data transfer.
When one talks about general data compression, isn't that relevant that it generally applies to text compression, and not video or audio?<p>I guess general data compression works on audio and video, but most of the time you either choose to compress text, audio, video or you create a file format that indexes your data.
Not really totally this topic, but I deal with a bunch of 8 Gig images of device firmwares. Is there anything that can make compressing these and de-duplicating these images fast? I have used borg in the past, but archive operations are just seem slow.
see Compression Benchmark at (<a href="https://github.com/powturbo/TurboBench" rel="nofollow">https://github.com/powturbo/TurboBench</a>)
TL;DR: Facebook acquires "Pied Piper" to gain exclusive rights on the Mean-Jerk-Time (MJT) compression patent; renames it to Zstandard for political correctness.