TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Parallel decompression of gzip-compressed files

3 pointsby nkrummalmost 6 years ago

2 comments

nkrummalmost 6 years ago
GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;Piezoid&#x2F;pugz" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Piezoid&#x2F;pugz</a>.<p>From the readme:<p>&quot;Contrary to the pigz program which does single-threaded decompression (see <a href="https:&#x2F;&#x2F;github.com&#x2F;madler&#x2F;pigz&#x2F;blob&#x2F;master&#x2F;pigz.c#L232" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;madler&#x2F;pigz&#x2F;blob&#x2F;master&#x2F;pigz.c#L232</a>), pugz found a way to do truly parallel decompression. In a nutshell: the compressed file is splitted into consecutive sections, processed one after the other. Sections are in turn splitted into chunks (one chunk per thread) and will be decompressed in parallel. A first pass decompresses chunks and keeps track of back-references (see e.g. our paper for the definition of that term), but is unable to resolve them. Then, a quick sequential pass is done to resolve the contexts of all chunks. A final parallel pass translates all unresolved back-references and outputs the file.&quot;
LinuxBenderalmost 6 years ago
Somewhat related, for bzip2, I use pbzip2 which uses all the cores, or as many as you specify. [1] It is in the EPEL repo for RHEL&#x2F;CentOS&#x2F;Fedora.<p>[1] - <a href="https:&#x2F;&#x2F;linux.die.net&#x2F;man&#x2F;1&#x2F;pbzip2" rel="nofollow">https:&#x2F;&#x2F;linux.die.net&#x2F;man&#x2F;1&#x2F;pbzip2</a>