科技回声

11 条评论

aphyr将近 13 年前

18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!Er, not really. How about..."pbzip2 reduced running time by 80%.""pbzip2 took only 20% as long as bzip2 did.""pbzip2 is five times faster."

评论 #4044573 未加载

wmf将近 13 年前

BTW, bz2 is kinda over. Check out xz and the parallel version pxz.

评论 #4044274 未加载

评论 #4044445 未加载

评论 #4045580 未加载

评论 #4044755 未加载

th0ma5将近 13 年前

Since our move to multicore over faster processors, I'm sure we'll see a lot of this sort of thing, that is, people suddenly realizing that their code will be some multiple faster if they can find a way to do operations in parallel. I imagine that the compression itself might be slightly less optimal however since similar blocks that could be compressed are on different threads? I didn't dig into how this might or might not be a concern with this project, however. Long of the short of it, however, parallel is the reality. In theory one could arbitrarily split the file, and then compress each of the splits and get a speed up that is roughly comparable?

评论 #4044280 未加载

评论 #4044519 未加载

评论 #4045511 未加载

评论 #4044288 未加载

sciurus将近 13 年前

For parallel gzip there's pigz (pronounced pig-zee).<a href="http://www.zlib.net/pigz/" rel="nofollow">http://www.zlib.net/pigz/</a>

dguido将近 13 年前

Parallel gzip, in case anyone wanted it: <a href="http://zlib.net/pigz/" rel="nofollow">http://zlib.net/pigz/</a>I've used it to great effect during incident response when I needed to search through hundreds of gigs of logs at a time.

malkia将近 13 年前

"The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!"File cache effect? He should cold reboot first (not sure how you force the file cache out on OSX/linux, on Windows I do it with SysInternals RamMap) and try in different order.It could still be faster, but he could really be measuring I/O that was done in the first case, and not in the second.It's also strange that .tar files are used, not tar.bz2 or .tbz (if such extension makes sense)

评论 #4044328 未加载

评论 #4044604 未加载

mattst88将近 13 年前

I used to use pbzip2 before I learned about lbzip2 (<a href="http://lacos.hu/" rel="nofollow">http://lacos.hu/</a>)lbzip2 is able to decompress single streams using multiple threads, which apparently pbzip2 cannot do. See the thread beginning with <a href="http://lists.debian.org/debian-mentors/2009/02/msg00098.html" rel="nofollow">http://lists.debian.org/debian-mentors/2009/02/msg00098.html</a>

juiceandjuice将近 13 年前

bzip2 has always been parallelizable. At one point a few years ago I was working on a compressed file format with that included compressed block metadata, because bzip2 is most efficient when it gets about ~900kB to compress at a time. In effect, you split the file up into 900kb chunks, compress them in parallel, and recombine them into one file at the end.

Inufu将近 13 年前

Is there a reason this is not the default?

评论 #4044485 未加载

BrainInAJar将近 13 年前

is there a pbzip2 that doesn't eat all your memory ?

rorrr将近 13 年前

A GPU implementation would be cool.

11 条评论

aphyr将近 13 年前

评论 #4044573 未加载

wmf将近 13 年前

BTW, bz2 is kinda over. Check out xz and the parallel version pxz.

评论 #4044274 未加载

评论 #4044445 未加载

评论 #4045580 未加载

评论 #4044755 未加载

th0ma5将近 13 年前

评论 #4044280 未加载

评论 #4044519 未加载

评论 #4045511 未加载

评论 #4044288 未加载

sciurus将近 13 年前

For parallel gzip there's pigz (pronounced pig-zee).<a href="http://www.zlib.net/pigz/" rel="nofollow">http://www.zlib.net/pigz/</a>

dguido将近 13 年前

malkia将近 13 年前

评论 #4044328 未加载

评论 #4044604 未加载

mattst88将近 13 年前

juiceandjuice将近 13 年前

Inufu将近 13 年前

Is there a reason this is not the default?

评论 #4044485 未加载

BrainInAJar将近 13 年前

is there a pbzip2 that doesn't eat all your memory ?

rorrr将近 13 年前

A GPU implementation would be cool.

Massive Speed Gains via Parallelized BZIP2 Compression

11 条评论

Massive Speed Gains via Parallelized BZIP2 Compression

11 条评论