How are zlib, gzip and Zip related? (2013)

652 pointsby yurisagalovalmost 9 years ago

18 comments

lpagealmost 9 years ago

In other compression news, Apple open sourced their implementation of lzfse yesterday: <a href="https://github.com/lzfse/lzfse" rel="nofollow">https://github.com/lzfse/lzfse</a>. It's based on a relatively new type of coding - asymmetric numeral systems. Huffman coding is only optimal if you consider one bit as the smallest unit of information. ANS (and more broadly, arithmetic coding) allows for fractional bits and gets closer to the Shannon limit. It's also simpler to implement than (real world) Huffman.Unfortunately, most open source implementations of ANS are not highly optimized and quite division heavy, so they lag on speed benchmarks. Apple's implementation looks pretty good (they're using it in OS X, err, macOS, and iOS) and there's some promising academic work being done on better implementations (optimizing Huffman for x86, ARM, and FPGA is a pretty well studied problem). The compression story is still being written.

评论 #11928491 未加载

评论 #11926267 未加载

评论 #11925862 未加载

评论 #11925520 未加载

评论 #11927209 未加载

评论 #11929876 未加载

chickenbanealmost 9 years ago

Not only is this a great read, but the follow up for citations is replied with "I am the reference".If this were reddit I'd post the hot fire gif. Eh, here it's anyway: <a href="http://i.imgur.com/VQLGJOL.gif" rel="nofollow">http://i.imgur.com/VQLGJOL.gif</a>

评论 #11925150 未加载

评论 #11925416 未加载

评论 #11924934 未加载

评论 #11925396 未加载

sikhnerdalmost 9 years ago

It's annoyingly common how the OP doesn't mark this answer as accepted, or even acknowledge how amazing this answer is from one of the technology's creators -- instead just goes on to ask a followup.

评论 #11924743 未加载

评论 #11926003 未加载

kbensonalmost 9 years ago

It seems like it wouldn't be that hard to create an indexed tar.gz format that's backwards compatible.One way would be to use the last file in the tar as the index, and as files are added, you can remove the index, append the new file, append some basic file metadata and the compressed offset (maybe of the deflate chunk) into the index, update the index size in bytes in a small footer at the end of the index, and append to the compressed tar (add).You can retrieve the index by starting at the end of the compressed archive, and reading backwards until you find a deflate header (at most 65k plus a few more bytes, since that's the size of a deflate chunk), If it's an indexed tar, the last file will be the index, and the end of the index will be a footer with the index size (so you know the maximum you'll need to seek back from the end). This isn't extremely efficient, but it is limited in scope, and helped by knowing the index size.You could verify the index by checking some or all of the reported file byte offsets. Worst case scenario is small files with one or more per deflate chunk, and you would have to visit each chunk. This makes the worst case scenario equivalent to listing files an un-indexed tar.gz, plus the overhead of locating and reading the index (relatively small).Uncompressing the archive as a regular tar.gz would result in a normal operation, with an additional file (the index) included.I imagine this isn't popular is not because it hasn't been done, but because most people don't really need an index.

评论 #11925163 未加载

评论 #11924901 未加载

评论 #11924909 未加载

评论 #11928935 未加载

rdslwalmost 9 years ago

Worth reading answer about coolest kid on the block: xz compression algorithm (of lzma fame) plus tar.gz vs tar.xz scenarios/discussion.<a href="http://stackoverflow.com/questions/6493270/why-is-tar-gz-still-much-more-common-than-tar-xz" rel="nofollow">http://stackoverflow.com/questions/6493270/why-is-tar-gz-sti...</a>

评论 #11925766 未加载

digi_owlalmost 9 years ago

The last few days i find myself wondering if there needs to be some kind of org set up to preserve this sort of info.Right now it seems to be strewn across a myriad of blogs, forums and whatsnot that risk going poof. And even if the Internet Archive picks them up, it is anything but curated (unlike say wikipedia, even with all the warts).

评论 #11928854 未加载

winterismutealmost 9 years ago

My father teaching me to type PKUNZIP on files that "ended with .zip" in the DOS shell (not long before the Norton Commander kind of GUI arrived to our computer) is one of my earliest memories as a toddler: I would ask him "What does it mean?" and he would simply not know. It was 1990 and I was 3 and a half I think. When I learned what it stood for it was kind of epic, for me.

hardwaresoftonalmost 9 years ago

It is rare to be able to have a question answered so completely and from such a first-hand source. This post is gold and tickles me in all the right places.StackOverflow is sitting on a veritable treasure trove of knowledge.

lunchablesalmost 9 years ago

Reminds me of the very sad zip story:<a href="https://www.youtube.com/watch?v=_zvFeHtcxuA" rel="nofollow">https://www.youtube.com/watch?v=_zvFeHtcxuA</a>The whole "The BBS Documentary" is great and I recommend starting at the beginning if you're interested in it.<a href="https://www.youtube.com/watch?v=dRap7uw9iWI" rel="nofollow">https://www.youtube.com/watch?v=dRap7uw9iWI</a>

404-universealmost 9 years ago

Where do the other popular compression utilities (e.g. bzip2, xzip, lzma, 7zip) fit in to this?

评论 #11924974 未加载

评论 #11924722 未加载

评论 #11926633 未加载

评论 #11924688 未加载

the_common_manalmost 9 years ago

One important difference in practice is that zip files needs to be saved to disk to be extracted. gzip files on the other hand can be stream unzipped i.e curl <a href="http://example.com/foo.tar.gz" rel="nofollow">http://example.com/foo.tar.gz</a> | tar zxvf - is possible but not with zip files. I am not sure if this is a limitation of the unzip tool. I would love to know if there is a work around to this.

评论 #11925059 未加载

评论 #11924866 未加载

tdicolaalmost 9 years ago

Wow a stackoverflow question that hasn't been closed or removed for some trivial reason--thought I'd never see something like that again.

评论 #11925463 未加载

评论 #11925935 未加载

评论 #11925498 未加载

评论 #11935710 未加载

评论 #11925291 未加载

minionslavealmost 9 years ago

It's kinda bad-ass when he said: you can use this text on Wikipedia, I'm the primary reference.

评论 #11925038 未加载

评论 #11925036 未加载

评论 #11924771 未加载

coryfkleinalmost 9 years ago

I love the discussion in the comments:> This post is packed with so much history and information that I feel like some citations need be added incase people try to reference this post as an information source. Though if this information is reflected somewhere with citations like Wikipedia, a link to such similar cited work would be appreciated. - ThorSummoner> I am the reference, having been part of all of that. This post could be cited in Wikipedia as an original source. – Mark Adler

virtualizedalmost 9 years ago

But can he invert a binary tree and is he willing to relocate to San Francisco?

评论 #11926037 未加载

评论 #11925287 未加载

评论 #11925699 未加载

评论 #11926013 未加载

评论 #11925499 未加载

adontzalmost 9 years ago

When I read "I am the reference" it reminded me of "I am the danger".<a href="https://www.youtube.com/watch?v=3v_zlyHgazs" rel="nofollow">https://www.youtube.com/watch?v=3v_zlyHgazs</a>

评论 #11925765 未加载

new_hackersalmost 9 years ago

When Chuck Norris computes checksums, he uses Adler-32

agumonkeyalmost 9 years ago

Archived link juste in case <a href="http://archive.is/SvUO5" rel="nofollow">http://archive.is/SvUO5</a>

评论 #11924827 未加载

评论 #11926250 未加载