TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Many times faster (de)compression using multiple processors.

37 点作者 igorhvr大约 13 年前

7 条评论

alecco大约 13 年前
It's good people get interested in the subject. But this is very odd and has some errors. For example xz requires a lot more memory resources than bzip2 (see benchmarks below, Mem column).<p><a href="http://mattmahoney.net/dc/text.html" rel="nofollow">http://mattmahoney.net/dc/text.html</a><p><a href="http://mattmahoney.net/dc/uiq/" rel="nofollow">http://mattmahoney.net/dc/uiq/</a><p>Matt Mahoney mantains the best benchmarks on text and generic compression. Some of the best on the field (like Matt) usually hang out at encode.ru.
评论 #3678434 未加载
评论 #3678969 未加载
bmm6o大约 13 年前
Is this decompressing a single stream on multiple processors? My knowledge of gzip is very limited, but I would have thought sequential processing was required. What's the trick here? (TFA doesn't explain anything, and e.g. pigz homepage doesn't either).
评论 #3678184 未加载
joshbaptiste大约 13 年前
Had to try this on my quad core laptop, as I never heard of these tools .<p><pre><code> josh@snoopy:~/Downloads $ grep -m2 -i intel /proc/cpuinfo vendor_id : GenuineIntel model name : Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz josh@snoopy:~/Downloads $ ls -l test -rw-r--r-- 1 josh josh 1073741824 2012-03-07 20:06 test josh@snoopy:~/Downloads $ time gzip test real 0m16.430s user 0m10.210s sys 0m0.490s josh@snoopy:~/Downloads $ time pigz test real 0m5.028s user 0m16.040s sys 0m0.620s </code></pre> Looks good.. although the man page describes it as being "an almost compatible replacement for the gzip program".
评论 #3678361 未加载
评论 #3678349 未加载
isocpprar大约 13 年前
Is xz less resource intensive then bzip2? My testing (admittedly two years ago or so) showed significant differences, better compression ratio with xz but significantly longer and/or more memory used.
评论 #3678259 未加载
PaulHoule大约 13 年前
If you're handling a lot of data it make sense to hash-partition it on some key and spread it out to a large number of files.<p>In that case you might have, say, 512 partitions and you can farm out compression, decompression and other tasks to as many CPUs as you want, even other machines in a cluster.
mappu大约 13 年前
I like to use PPMd (via 7zip) for large volumes of text, but it seems to only be single-threaded, which is a shame. It cuts a good 30% again off the size of the .xml.bz2's that Wikipedia provides.
dhruvbird大约 13 年前
This is awesome since compression in parallel has been largely neglected in practice.
评论 #3678665 未加载