TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Massive Speed Gains via Parallelized BZIP2 Compression

43 点作者 SnowLprd将近 13 年前

11 条评论

aphyr将近 13 年前
<i>18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!</i><p>Er, not really. How about...<p>"pbzip2 reduced running time by 80%."<p>"pbzip2 took only 20% as long as bzip2 did."<p>"pbzip2 is five times faster."
评论 #4044573 未加载
wmf将近 13 年前
BTW, bz2 is kinda over. Check out xz and the parallel version pxz.
评论 #4044274 未加载
评论 #4044445 未加载
评论 #4045580 未加载
评论 #4044755 未加载
th0ma5将近 13 年前
Since our move to multicore over faster processors, I'm sure we'll see a lot of this sort of thing, that is, people suddenly realizing that their code will be some multiple faster if they can find a way to do operations in parallel. I imagine that the compression itself might be slightly less optimal however since similar blocks that could be compressed are on different threads? I didn't dig into how this might or might not be a concern with this project, however. Long of the short of it, however, parallel is the reality. In theory one could arbitrarily split the file, and then compress each of the splits and get a speed up that is roughly comparable?
评论 #4044280 未加载
评论 #4044519 未加载
评论 #4045511 未加载
评论 #4044288 未加载
sciurus将近 13 年前
For parallel gzip there's pigz (pronounced pig-zee).<p><a href="http://www.zlib.net/pigz/" rel="nofollow">http://www.zlib.net/pigz/</a>
dguido将近 13 年前
Parallel gzip, in case anyone wanted it: <a href="http://zlib.net/pigz/" rel="nofollow">http://zlib.net/pigz/</a><p>I've used it to great effect during incident response when I needed to search through hundreds of gigs of logs at a time.
malkia将近 13 年前
"The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!"<p>File cache effect? He should cold reboot first (not sure how you force the file cache out on OSX/linux, on Windows I do it with SysInternals RamMap) and try in different order.<p>It could still be faster, but he could really be measuring I/O that was done in the first case, and not in the second.<p>It's also strange that .tar files are used, not tar.bz2 or .tbz (if such extension makes sense)
评论 #4044328 未加载
评论 #4044604 未加载
mattst88将近 13 年前
I used to use pbzip2 before I learned about lbzip2 (<a href="http://lacos.hu/" rel="nofollow">http://lacos.hu/</a>)<p>lbzip2 is able to decompress single streams using multiple threads, which apparently pbzip2 cannot do. See the thread beginning with <a href="http://lists.debian.org/debian-mentors/2009/02/msg00098.html" rel="nofollow">http://lists.debian.org/debian-mentors/2009/02/msg00098.html</a>
juiceandjuice将近 13 年前
bzip2 has always been parallelizable. At one point a few years ago I was working on a compressed file format with that included compressed block metadata, because bzip2 is most efficient when it gets about ~900kB to compress at a time. In effect, you split the file up into 900kb chunks, compress them in parallel, and recombine them into one file at the end.
Inufu将近 13 年前
Is there a reason this is not the default?
评论 #4044485 未加载
BrainInAJar将近 13 年前
is there a pbzip2 that doesn't eat <i>all</i> your memory ?
rorrr将近 13 年前
A GPU implementation would be cool.