TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Zstandard Worked Example

175 pointsby stargraveabout 3 years ago

7 comments

dietricheppabout 3 years ago
Nice writeup! FSE, one of the parts of Zstandard, is a very elegant entropy coder. If you&#x27;re following this section of the article, you might be wondering how it&#x27;s encoded, since the encoding process is somewhat less obvious than for Huffman coding.<p>What causes trouble is simply, &quot;Once you&#x27;ve emitted a symbol, how do you know that the next symbol is in the range [BL, BL+2^NB)?&quot; The answer is that the encoding is performed backwards, starting from the end of the input, and the table is constructed so that for any symbol emitted, you can find a <i>previous</i> symbol so that the <i>current</i> symbol is in the previous symbol&#x27;s [BL, BL+2^NB) range.<p>There&#x27;s another writeup about FSE here, which goes into more depth about FSE (rather than talking about Zstandard in general):<p><a href="http:&#x2F;&#x2F;fastcompression.blogspot.com&#x2F;2013&#x2F;12&#x2F;finite-state-entropy-new-breed-of.html" rel="nofollow">http:&#x2F;&#x2F;fastcompression.blogspot.com&#x2F;2013&#x2F;12&#x2F;finite-state-ent...</a><p>My intuition tells me that there&#x27;s probably a homomorphism from Huffman coding to FSE that preserves the encoded size exactly, but I haven&#x27;t done the math to check.
评论 #31433109 未加载
评论 #31430641 未加载
terrellnabout 3 years ago
Awesome post, it&#x27;s great to see someone diving into the details of Zstd!<p>Maintainer of Zstd here, AMA
评论 #31429308 未加载
评论 #31428982 未加载
评论 #31431390 未加载
评论 #31433112 未加载
评论 #31429015 未加载
keyleabout 3 years ago
Interesting, I&#x27;ve never heard of it before. I was aware that there were better standards than zip, however adoption was always an issue with licenses. If this is as open, is it purely an issue of wide adoption?<p>We&#x27;re stuck in the C++ paradigm (I&#x27;m sure there is a better name), where everyone agree this isn&#x27;t &quot;ideal&quot;, that there is better, but not widely adopted enough?
评论 #31428578 未加载
评论 #31428568 未加载
评论 #31428705 未加载
评论 #31428457 未加载
kazinatorabout 3 years ago
Is there some smaller version of this, suitable for embedding into programs?<p><pre><code> $ size &#x2F;usr&#x2F;lib&#x2F;i386-linux-gnu&#x2F;libzstd.so.1.3.3 text data bss dec hex filename 508674 492 20 509186 7c502 &#x2F;usr&#x2F;lib&#x2F;i386-linux-gnu&#x2F;libzstd.so.1.3.3 </code></pre> Yikes; half a meg of code!<p>Say I just want something in my C program for saving certain files using less space, and only care about decompressing only those files. Is there some small implementation of a subset of Zstandard: say no more than four or five source files, and 64K of machine code?<p>If I add 500K to the program, I&#x27;m going to have to save 500K by compressing the accompanying files, just to begin breaking even.<p>For comparison, libz may not have the decompression speed or the ratio, but it&#x27;s more than four times smaller:<p><pre><code> $ size &#x2F;lib&#x2F;i386-linux-gnu&#x2F;libz.so.1.2.11 text data bss dec hex filename 115537 588 4 116129 1c5a1 &#x2F;lib&#x2F;i386-linux-gnu&#x2F;libz.so.1.2.11 </code></pre> This is more like it!<p><pre><code> $ size &#x2F;lib&#x2F;i386-linux-gnu&#x2F;libbz2.so.1.0.4 text data bss dec hex filename 60511 3520 4 64035 fa23 &#x2F;lib&#x2F;i386-linux-gnu&#x2F;libbz2.so.1.0.4 </code></pre> Text 60511. Is there some configuration of Zstandard which can cram into comparable code space?
评论 #31432343 未加载
评论 #31432556 未加载
评论 #31437453 未加载
评论 #31435134 未加载
评论 #31432420 未加载
londons_exploreabout 3 years ago
This post talksabout literals still being encoded with huffman tables and stuff.<p>However, it appears this must be optional, because short strings compressed with zstd pass through &#x27;unencoded&#x27;, apart from a header and footer added:<p><pre><code> $ echo &quot;hello world&quot; | zstd | hexdump -C 00000000 28 b5 2f fd 04 58 61 00 00 68 65 6c 6c 6f 20 77 |(.&#x2F;..Xa..hello w| 00000010 6f 72 6c 64 0a 8c 6d 7d 20 |orld..m} | </code></pre> As an aside, I&#x27;m fairly disappointed that zstd didn&#x27;t use a single byte header for uncompressible strings, so that they could guarantee that the compressed data will never be more than 1 byte larger than the source data. That property is very useful where lots of tiny strings are being compressed, such as in a database.
评论 #31428596 未加载
评论 #31428521 未加载
评论 #31428694 未加载
userbinatorabout 3 years ago
LZ followed by entropy coding seems to be the general strategy of choice for many compression algorithms, so this design of Zstandard was quite familiar to me. The FSE reminded me of the Q&#x2F;MQ&#x2F;QM-coder algorithm which is used in some image formats (and is surprisingly simple yet powerful, although relatively slow.)<p><i>Nonetheless, I like to learn a file format by studying a worked example at the bits and bytes level.</i><p>I did the same when I was playing around with various image and video codecs (JPEG, GIF, MPEG-1&#x2F;2, etc.), and I agree that it definitely helps with understanding as well as debugging; you can get to the point of being able to decode, slowly, by just staring at a hexdump and &quot;reading&quot; the bits and bytes as if it was a new language.
netheril96about 3 years ago
My biggest gripe with zstd is that it is difficult to find a Windows program to operate on them. I had to manually download zstd and a bunch of dependencies manually from msys2 in order to decompress.
评论 #31432268 未加载