TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Improving compression at scale with Zstandard

139 pointsby felixhandteover 6 years ago

10 comments

karavelovover 6 years ago
&gt; Two years ago, Facebook open-sourced Zstandard v1.0...<p>Bullshit, Zstd was open-source from the very beginning, they just hired Yann and moved the project under facebook org. How do I know? I have written the JVM bindings [1] since v0.1 that are now used by Spark, Kafka, etc.<p>EDIT: Actually, my initial bindings were against v0.0.2 [2]<p>Kudos to FB for hiring him and helping Zstd getting production ready. This is just a PR false claim.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;luben&#x2F;zstd-jni" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;luben&#x2F;zstd-jni</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;luben&#x2F;zstd-jni&#x2F;commit&#x2F;3dfe760cbb8cc46da3268af6aa73dce6014298ef" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;luben&#x2F;zstd-jni&#x2F;commit&#x2F;3dfe760cbb8cc46da32...</a>
评论 #18720899 未加载
评论 #18720593 未加载
评论 #18720760 未加载
IvanK_netover 6 years ago
My browser loaded that website with a header: accept-encoding: gzip, deflate, br (&quot;br&quot; means Brotli by Google)<p>The response had a header: content-encoding: gzip<p>Zstandard looks like an improvement of DEFLATE (= gzip = zlib) and its specification is only 3x longer (even though it is introduced 22 years later): <a href="https:&#x2F;&#x2F;tools.ietf.org&#x2F;html&#x2F;rfc8478" rel="nofollow">https:&#x2F;&#x2F;tools.ietf.org&#x2F;html&#x2F;rfc8478</a><p>Since Zstandard is so simple and efficient, I thought it would get into browsers very quickly. Then, it could make sense to compress even PNG or JPG images, which are usually impossible to compress with DEFLATE.
评论 #18721701 未加载
评论 #18720277 未加载
评论 #18721358 未加载
valarauca1over 6 years ago
I&#x27;d really like to thank Cyan for their contributions. `zstd` and `lz4` are great. I&#x27;m pretty much exclusively using `zstd` for my tarball needs in the present day as it beats the pants off `gzip` and for plane text code (most of what I compress) it performs amazingly. (shameless self promotion) I wrote my own tar clone to make usage of it [1].<p>It is nice to have disk IO be the limiting factor on decompression even when you are using NVMe drives.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;valarauca&#x2F;car" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;valarauca&#x2F;car</a>
stochastic_monkover 6 years ago
The best thing about zstd is its zlibWrapper, which lets you write code as if you’re consuming zlib-compressed files while transparently working with zlib-, zstd-, or uncompressed files. I build several of my tools with zstd for this reason.
评论 #18723648 未加载
评论 #18722742 未加载
golergkaover 6 years ago
I used zstandard to compress mesages in P2P multiplayer game engine, and, taught on our real-life packets, it got us 2x-5x improvement. Awesome library, will use it in any similar project from now on.
josephgover 6 years ago
How does zstd compare with brotli? Would it be a better compression standard for http responses?
评论 #18720615 未加载
评论 #18720216 未加载
评论 #18720213 未加载
koolbaover 6 years ago
&gt; And the zstd binary itself includes a dictionary trainer (zstd --train). Building a pipeline for handling compression dictionaries can therefore be reduced to being a matter of gluing these building blocks together in relatively straightforward ways.<p>What happens if your user data trained dictionary ends up storing user data and you receive a GPDR destruction request?
评论 #18720505 未加载
评论 #18723363 未加载
jclayover 6 years ago
I find the first chart so hard to understand. The axes need labels, and the color scheme is not ideal. They should use different line styles, and add a caption below summarizing the findings. There&#x27;s a reason journals often require graphs to be formatted this way.<p>This is a resource I&#x27;ve found helpful: <a href="https:&#x2F;&#x2F;www3.nd.edu&#x2F;~pkamat&#x2F;pdf&#x2F;graphs.pdf" rel="nofollow">https:&#x2F;&#x2F;www3.nd.edu&#x2F;~pkamat&#x2F;pdf&#x2F;graphs.pdf</a><p>&quot;Consider readers with color blindness or deficiencies&quot;<p>&quot;Avoid colors that are difficult to distinguish&quot;
评论 #18719951 未加载
评论 #18719905 未加载
m0zgover 6 years ago
I hope they pay greater attention to the low and high end of their compression ratio spectrum. On the low end, it&#x27;d be great if it could exceed lz4 in terms of speed and memory savings. On the high end it&#x27;d be great to exceed XZ&#x2F;LZMA.<p>Right now it&#x27;s impressive &quot;in the middle&quot;, but I find myself in a lot of situations where I care about the extremes. I.e. for something that will be transferred a lot, or cold-stored, I want maximum compression, CPU&#x2F;RAM usage be damned, within reason. So I tend to use LZMA there if files aren&#x27;t too large. For realtime&#x2F;network RPC scenarios I want minimum RAM&#x2F;CPU usage and Pareto-optimality on multi-GbE networks. This is where I use LZ4 (and used to use Snappy&#x2F;Zippy).<p>At their scale, though, FB is surely saving many millions of dollars thanks to deploying this, both in human&#x2F;machine time savings and storage savings.
评论 #18720350 未加载
评论 #18720337 未加载
jzawodnover 6 years ago
Am I the only one getting sick of &quot;at scale&quot;?
评论 #18723320 未加载