TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Many times faster (de)compression using multiple processors.

37 pointsby igorhvrabout 13 years ago

7 comments

aleccoabout 13 years ago
It's good people get interested in the subject. But this is very odd and has some errors. For example xz requires a lot more memory resources than bzip2 (see benchmarks below, Mem column).<p><a href="http://mattmahoney.net/dc/text.html" rel="nofollow">http://mattmahoney.net/dc/text.html</a><p><a href="http://mattmahoney.net/dc/uiq/" rel="nofollow">http://mattmahoney.net/dc/uiq/</a><p>Matt Mahoney mantains the best benchmarks on text and generic compression. Some of the best on the field (like Matt) usually hang out at encode.ru.
评论 #3678434 未加载
评论 #3678969 未加载
bmm6oabout 13 years ago
Is this decompressing a single stream on multiple processors? My knowledge of gzip is very limited, but I would have thought sequential processing was required. What's the trick here? (TFA doesn't explain anything, and e.g. pigz homepage doesn't either).
评论 #3678184 未加载
joshbaptisteabout 13 years ago
Had to try this on my quad core laptop, as I never heard of these tools .<p><pre><code> josh@snoopy:~/Downloads $ grep -m2 -i intel /proc/cpuinfo vendor_id : GenuineIntel model name : Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz josh@snoopy:~/Downloads $ ls -l test -rw-r--r-- 1 josh josh 1073741824 2012-03-07 20:06 test josh@snoopy:~/Downloads $ time gzip test real 0m16.430s user 0m10.210s sys 0m0.490s josh@snoopy:~/Downloads $ time pigz test real 0m5.028s user 0m16.040s sys 0m0.620s </code></pre> Looks good.. although the man page describes it as being "an almost compatible replacement for the gzip program".
评论 #3678361 未加载
评论 #3678349 未加载
isocpprarabout 13 years ago
Is xz less resource intensive then bzip2? My testing (admittedly two years ago or so) showed significant differences, better compression ratio with xz but significantly longer and/or more memory used.
评论 #3678259 未加载
PaulHouleabout 13 years ago
If you're handling a lot of data it make sense to hash-partition it on some key and spread it out to a large number of files.<p>In that case you might have, say, 512 partitions and you can farm out compression, decompression and other tasks to as many CPUs as you want, even other machines in a cluster.
mappuabout 13 years ago
I like to use PPMd (via 7zip) for large volumes of text, but it seems to only be single-threaded, which is a shame. It cuts a good 30% again off the size of the .xml.bz2's that Wikipedia provides.
dhruvbirdabout 13 years ago
This is awesome since compression in parallel has been largely neglected in practice.
评论 #3678665 未加载