TechEcho

11 comments

aphyralmost 13 years ago

18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!Er, not really. How about..."pbzip2 reduced running time by 80%.""pbzip2 took only 20% as long as bzip2 did.""pbzip2 is five times faster."

评论 #4044573 未加载

wmfalmost 13 years ago

BTW, bz2 is kinda over. Check out xz and the parallel version pxz.

评论 #4044274 未加载

评论 #4044445 未加载

评论 #4045580 未加载

评论 #4044755 未加载

th0ma5almost 13 years ago

Since our move to multicore over faster processors, I'm sure we'll see a lot of this sort of thing, that is, people suddenly realizing that their code will be some multiple faster if they can find a way to do operations in parallel. I imagine that the compression itself might be slightly less optimal however since similar blocks that could be compressed are on different threads? I didn't dig into how this might or might not be a concern with this project, however. Long of the short of it, however, parallel is the reality. In theory one could arbitrarily split the file, and then compress each of the splits and get a speed up that is roughly comparable?

评论 #4044280 未加载

评论 #4044519 未加载

评论 #4045511 未加载

评论 #4044288 未加载

sciurusalmost 13 years ago

For parallel gzip there's pigz (pronounced pig-zee).<a href="http://www.zlib.net/pigz/" rel="nofollow">http://www.zlib.net/pigz/</a>

dguidoalmost 13 years ago

Parallel gzip, in case anyone wanted it: <a href="http://zlib.net/pigz/" rel="nofollow">http://zlib.net/pigz/</a>I've used it to great effect during incident response when I needed to search through hundreds of gigs of logs at a time.

malkiaalmost 13 years ago

"The results: 18.7 seconds for bzip2, and… wait for it… 3.5 seconds for pbzip2. That’s an increase of over 80%!"File cache effect? He should cold reboot first (not sure how you force the file cache out on OSX/linux, on Windows I do it with SysInternals RamMap) and try in different order.It could still be faster, but he could really be measuring I/O that was done in the first case, and not in the second.It's also strange that .tar files are used, not tar.bz2 or .tbz (if such extension makes sense)

评论 #4044328 未加载

评论 #4044604 未加载

mattst88almost 13 years ago

I used to use pbzip2 before I learned about lbzip2 (<a href="http://lacos.hu/" rel="nofollow">http://lacos.hu/</a>)lbzip2 is able to decompress single streams using multiple threads, which apparently pbzip2 cannot do. See the thread beginning with <a href="http://lists.debian.org/debian-mentors/2009/02/msg00098.html" rel="nofollow">http://lists.debian.org/debian-mentors/2009/02/msg00098.html</a>

juiceandjuicealmost 13 years ago

bzip2 has always been parallelizable. At one point a few years ago I was working on a compressed file format with that included compressed block metadata, because bzip2 is most efficient when it gets about ~900kB to compress at a time. In effect, you split the file up into 900kb chunks, compress them in parallel, and recombine them into one file at the end.

Inufualmost 13 years ago

Is there a reason this is not the default?

评论 #4044485 未加载

BrainInAJaralmost 13 years ago

is there a pbzip2 that doesn't eat all your memory ?

rorrralmost 13 years ago

A GPU implementation would be cool.

11 comments

aphyralmost 13 years ago

评论 #4044573 未加载

wmfalmost 13 years ago

BTW, bz2 is kinda over. Check out xz and the parallel version pxz.

评论 #4044274 未加载

评论 #4044445 未加载

评论 #4045580 未加载

评论 #4044755 未加载

th0ma5almost 13 years ago

评论 #4044280 未加载

评论 #4044519 未加载

评论 #4045511 未加载

评论 #4044288 未加载

sciurusalmost 13 years ago

For parallel gzip there's pigz (pronounced pig-zee).<a href="http://www.zlib.net/pigz/" rel="nofollow">http://www.zlib.net/pigz/</a>

dguidoalmost 13 years ago

malkiaalmost 13 years ago

评论 #4044328 未加载

评论 #4044604 未加载

mattst88almost 13 years ago

juiceandjuicealmost 13 years ago

Inufualmost 13 years ago

Is there a reason this is not the default?

评论 #4044485 未加载

BrainInAJaralmost 13 years ago

is there a pbzip2 that doesn't eat all your memory ?

rorrralmost 13 years ago

A GPU implementation would be cool.

Massive Speed Gains via Parallelized BZIP2 Compression

11 comments

Massive Speed Gains via Parallelized BZIP2 Compression

11 comments