TechEcho

7 comments

aleccoabout 13 years ago

It's good people get interested in the subject. But this is very odd and has some errors. For example xz requires a lot more memory resources than bzip2 (see benchmarks below, Mem column).<a href="http://mattmahoney.net/dc/text.html" rel="nofollow">http://mattmahoney.net/dc/text.html</a><a href="http://mattmahoney.net/dc/uiq/" rel="nofollow">http://mattmahoney.net/dc/uiq/</a>Matt Mahoney mantains the best benchmarks on text and generic compression. Some of the best on the field (like Matt) usually hang out at encode.ru.

评论 #3678434 未加载

评论 #3678969 未加载

bmm6oabout 13 years ago

Is this decompressing a single stream on multiple processors? My knowledge of gzip is very limited, but I would have thought sequential processing was required. What's the trick here? (TFA doesn't explain anything, and e.g. pigz homepage doesn't either).

评论 #3678184 未加载

joshbaptisteabout 13 years ago

Had to try this on my quad core laptop, as I never heard of these tools .<pre><code> josh@snoopy:~/Downloads $ grep -m2 -i intel /proc/cpuinfo vendor_id : GenuineIntel model name : Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz josh@snoopy:~/Downloads $ ls -l test -rw-r--r-- 1 josh josh 1073741824 2012-03-07 20:06 test josh@snoopy:~/Downloads $ time gzip test real 0m16.430s user 0m10.210s sys 0m0.490s josh@snoopy:~/Downloads $ time pigz test real 0m5.028s user 0m16.040s sys 0m0.620s </code></pre> Looks good.. although the man page describes it as being "an almost compatible replacement for the gzip program".

评论 #3678361 未加载

评论 #3678349 未加载

isocpprarabout 13 years ago

Is xz less resource intensive then bzip2? My testing (admittedly two years ago or so) showed significant differences, better compression ratio with xz but significantly longer and/or more memory used.

评论 #3678259 未加载

PaulHouleabout 13 years ago

If you're handling a lot of data it make sense to hash-partition it on some key and spread it out to a large number of files.In that case you might have, say, 512 partitions and you can farm out compression, decompression and other tasks to as many CPUs as you want, even other machines in a cluster.

mappuabout 13 years ago

I like to use PPMd (via 7zip) for large volumes of text, but it seems to only be single-threaded, which is a shame. It cuts a good 30% again off the size of the .xml.bz2's that Wikipedia provides.

dhruvbirdabout 13 years ago

This is awesome since compression in parallel has been largely neglected in practice.

评论 #3678665 未加载

7 comments

aleccoabout 13 years ago

评论 #3678434 未加载

评论 #3678969 未加载

bmm6oabout 13 years ago

评论 #3678184 未加载

joshbaptisteabout 13 years ago

评论 #3678361 未加载

评论 #3678349 未加载

isocpprarabout 13 years ago

评论 #3678259 未加载

PaulHouleabout 13 years ago

mappuabout 13 years ago

I like to use PPMd (via 7zip) for large volumes of text, but it seems to only be single-threaded, which is a shame. It cuts a good 30% again off the size of the .xml.bz2's that Wikipedia provides.

dhruvbirdabout 13 years ago

This is awesome since compression in parallel has been largely neglected in practice.

评论 #3678665 未加载

Many times faster (de)compression using multiple processors.

7 comments

Many times faster (de)compression using multiple processors.

7 comments