TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Zstandard – Real-time data compression algorithm

286 pointsby josephscottover 7 years ago

25 comments

tzsover 7 years ago
&gt; Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression &#x2F; speed trade-off<p>This would have been very handy about 17 years ago when I was doing a virtual CD-ROM driver, and needed to store compressed images, and the client wanted a wide range of speed&#x2F;compression options for the user--much wider than zlib offered.<p>I ended up doing a hack that was either the most disgusting thing I have ever done, or was brilliant. I have not been able to decide which.<p>My hack: I gave them a compression&#x2F;speed slider that went from 0 to 100. If the slider was set to N, I would apply zlib at maximum compression to N consecutive blocks, and then apply no compression to 100-N consecutive blocks. Repeat until the whole image is stored.<p>The client loved it.
评论 #16228002 未加载
评论 #16228346 未加载
评论 #16229791 未加载
评论 #16229699 未加载
评论 #16228455 未加载
评论 #16229496 未加载
评论 #16228076 未加载
评论 #16229928 未加载
评论 #16229781 未加载
ComputerGuruover 7 years ago
zstd is incredible, but just in case the thought hasn&#x27;t occurred to someone here that may benefit from it: if you&#x27;re in control of the send and the receive, type-specific compression is hard to beat.<p>For example, if you know you&#x27;re dealing with text, you can use snappy, if you know you&#x27;re dealing with images, webp, videos x264 (or x265 if you only care about decode speed and encoded size), etc and then fall back to zstd only when you don&#x27;t have a specific compressor for the chosen file type.
评论 #16227935 未加载
评论 #16229028 未加载
评论 #16227970 未加载
评论 #16230244 未加载
评论 #16228277 未加载
评论 #16227927 未加载
评论 #16227859 未加载
beagle3over 7 years ago
I used to object to using zstd based on Facebook&#x27;s onerous license and patent policy; but now that zstd is plain BSD+GPLv2, I endorse it.
terrellnover 7 years ago
Zstd recently added a long range mode that can find matches up to 2 GB in the past using a specialized algorithm [1]. It can be enabled on the command line with `zstd --long` for a 128 MB window, or `--long=windowLog` for a `2^windowLog` byte window.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;zstd&#x2F;releases&#x2F;tag&#x2F;v1.3.2" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;zstd&#x2F;releases&#x2F;tag&#x2F;v1.3.2</a>
评论 #16230361 未加载
评论 #16228597 未加载
loegover 7 years ago
We added zstd to largely supersede gzip use at work. It performs better than oldschool gzip in both compression and speed.<p>That said, I&#x27;m not sure I&#x27;d call it a &quot;real-time&quot; compression algorithm. It&#x27;s still an factor of 2x slower than lz4 compress and a factor of 4x slower decompress.
评论 #16227581 未加载
评论 #16228017 未加载
waterhouseover 7 years ago
A curious fact I encountered: First, the zstdcat executable that ships with zstd (actually a symlink to zstd itself, roughly equivalent to &quot;zstd -cqdf&quot;) can decompress .gz files. Second, zstd is <i>faster</i> than gzip at decompressing .gz files, taking about 2&#x2F;3 as long.<p>I&#x27;m not kidding. I couldn&#x27;t believe it myself, but subsequent testing stubbornly bore it out—on one file that was 15 MB after compression, and on a mix of smaller files. I tried compiling gzip from source, using the same compiler I used for zstd, and the results were the same. strace seemed to show zstd read and wrote in chunks 2x the size, but the number of syscalls didn&#x27;t seem to be nearly enough to explain the difference. It seems to have this &quot;zlibWrapper&quot; subdirectory; its README has some benchmark numbers that I don&#x27;t fully understand, but some of them seem to match the 2&#x2F;3 factor I got.<p>I&#x27;m wondering if this is a clever tactic to drive adoption—get people to use the zstd executable even when they&#x27;re still using gzipped files. ;-)<p>Also, the fact that it has support for reading (and, apparently, writing) gzip, lz4, and xz on top of its own format really makes &quot;z standard&quot; an appropriate name.
评论 #16230453 未加载
cmurfover 7 years ago
Supported on Btrfs since kernel 4.14. Like other compression options, it&#x27;s a mount time option.<p>Edit: Also in squashfs. Here&#x27;s the git pull request which includes some benchmarks. <a href="https:&#x2F;&#x2F;lkml.org&#x2F;lkml&#x2F;2017&#x2F;9&#x2F;11&#x2F;369" rel="nofollow">https:&#x2F;&#x2F;lkml.org&#x2F;lkml&#x2F;2017&#x2F;9&#x2F;11&#x2F;369</a>
评论 #16228783 未加载
desertrider12over 7 years ago
I wonder how the RAD Game Tools compressors would fit into that benchmark list. In their benchmarks Oodle Selkie has roughly the same ratio as Zlib but is 3-4x faster at decompression than Zstd (not the same benchmark though). <a href="http:&#x2F;&#x2F;www.radgametools.com&#x2F;oodle.htm" rel="nofollow">http:&#x2F;&#x2F;www.radgametools.com&#x2F;oodle.htm</a>
评论 #16230137 未加载
jdhawkover 7 years ago
We&#x27;ve been using zstd for a while in production, switching over from rar&#x2F;gzip&#x2F;7z for archiving huge XML files. It&#x27;s speed to compression ratio is really impressive. Hats off to the development team.
cjhanksover 7 years ago
I use the C bindings for ZStandard, it performs very nicely as an algorithm. I wish the documentation was a little easier to use. I am still not sure if the provided examples really properly handle the degenerate cases of &#x2F;dev&#x2F;urandom and &#x2F;dev&#x2F;zero when using `ZSTD_DStream{In,Out}Size` and friends.<p>In any case - thanks for releasing this, it&#x27;s been very helpful to me.
bajsejohannesover 7 years ago
Some good discussion in earlier postings too: <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=zstandard&amp;sort=byPopularity&amp;prefix&amp;page=0&amp;dateRange=all&amp;type=story" rel="nofollow">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?query=zstandard&amp;sort=byPopularity&amp;pr...</a>
reacharavindhover 7 years ago
Anyone using zstd along with an incremental rsync approach? Like gzip --rsyncable ?<p>Say I have 1000 files. I want to compress them and let the cron rsync do it&#x27;s thing. Next day, if only one file had changed, rsync should pickup only the differential instead of the whole archive.<p>Or is there a better way of doing it?
cakooseover 7 years ago
In the &quot;Compression Speed vs Ratio&quot; graph, the Y-axis doesn&#x27;t start at zero. If it were changed to start at zero, it would be easier to evaluate the magnitude of the compression ratio change at a glance. IMO, that&#x27;s probably worth the extra whitespace in the graph.
revelationover 7 years ago
From that benchmark list I&#x27;d prefer lz4..
评论 #16228185 未加载
lars_franckeover 7 years ago
Last time this was discussed Zstandard didn&#x27;t have a splittable mode and by the looks of it[1] they still don&#x27;t. That doesn&#x27;t make it a bad algorithm it just means that it&#x27;s not a good choice yet for Hadoop et al. As far as I know no container format has implemented Zstandard yet.<p>Does anyone know any better? It seems like we could use a better alternative to Snappy.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;zstd&#x2F;issues&#x2F;395" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;zstd&#x2F;issues&#x2F;395</a>
评论 #16229692 未加载
jakozaurover 7 years ago
In a lot of big data workflows, zstd is no brainer replacement of gzip. Always faster decompression, plus you can save tons of storage and data transfer.
jokoonover 7 years ago
When one talks about general data compression, isn&#x27;t that relevant that it generally applies to text compression, and not video or audio?<p>I guess general data compression works on audio and video, but most of the time you either choose to compress text, audio, video or you create a file format that indexes your data.
teejover 7 years ago
We switched many of our Redshift compression encodings to zstd. Works fantastically on text and JSON.
karmicthreatover 7 years ago
Not really totally this topic, but I deal with a bunch of 8 Gig images of device firmwares. Is there anything that can make compressing these and de-duplicating these images fast? I have used borg in the past, but archive operations are just seem slow.
评论 #16231924 未加载
unixheroover 7 years ago
So like &#x27;middle out&#x27;
powturboover 7 years ago
see Compression Benchmark at (<a href="https:&#x2F;&#x2F;github.com&#x2F;powturbo&#x2F;TurboBench" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;powturbo&#x2F;TurboBench</a>)
2pointsomoneover 7 years ago
That&#x27;s the best Weissman score I have ever seen.
评论 #16228572 未加载
natbobcover 7 years ago
Why is this site an SPA?!?!?
评论 #16228803 未加载
评论 #16229118 未加载
DyslexicAtheistover 7 years ago
TL;DR: Facebook acquires &quot;Pied Piper&quot; to gain exclusive rights on the Mean-Jerk-Time (MJT) compression patent; renames it to Zstandard for political correctness.
gateway7over 7 years ago
That patent clause though. Stick to the fork from before Facebook bought the man and the algorithm.
评论 #16228358 未加载