科技回声

3 条评论

DanBC将近 12 年前

> Huffman coding tries to compress text one letter at a time on the assumption that each letter comes from some fixed and known probability distribution. If the algorithm is successful then we'd expect the compressed text to look like a uniformly distributed sequence of bits. If it didn't then there'd be patterns that could be used for further compression.This can be gently confusing when you're using different compression systems, (bits vs bytes)(<a href="https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussion" rel="nofollow">https://groups.google.com/d/topic/lz4c/DcN5SgFywwk/discussio...</a>)Someone is compressing very large log files. They then compressed the output, and got further reductions in size.> The fundamental reason is that these highly repetitive byte sequences, with very small and regular differences, produce repetitive compressed sequences, which can therefore be compressed further. - Yann Collet

hadronzoo将近 12 年前

The same is true of arithmetic coding, which separates the probability model from the encoding process. Feed an arithmetic coder a stream of random bits and it will efficiently sample from your model. See section 6.3: <a href="http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124.pdf" rel="nofollow">http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/105.124....</a>

Chris2048将近 12 年前

I simply don't understand this, I don't think I have the right background. Any good starting places for finding more about this topic?

评论 #6178361 未加载

评论 #6178370 未加载

评论 #6178582 未加载

Lossless decompression and the generation of random samples

3 条评论

Lossless decompression and the generation of random samples

3 条评论