The BLAKE3 cryptographic hash function

288 pointsby erwanover 5 years ago

14 comments

s_tecover 5 years ago

It looks like the speedup is coming from two main changes.The first change is reducing the number of rounds from 10 to 7. Think of it like making a smoothie - you add bits of fruit to the drink (the input data), then pulse the blades to blend it up (making the output hash). This change basically runs the blades for 7 seconds instead of 10 seconds each time they add fruit. They cite evidence that the extra 3 seconds aren't doing much - once the fruit's fully liquid, extra blending doesn't help - but I worry that this reduces the security margin. Maybe those extra 3 rounds aren't useful against current attacks, but they may be useful against unknown future attacks.The other change they make is to break the input into 1KiB chunks, then hash each chunk independently. Finally, they combine the individual chunk hashes into a single big hash using a binary tree. The benefit is that if you have 4KiB of data, you can use 4-way SIMD instructions to process all four chunks simultaneously. The more data you have, the more parallelism you can unlock, unlike traditional hash functions that process everything sequentially. On the flip side, modern SIMD instructions can handle 2 x 32-bit instructions just as fast as 1 x 64-bit instructions, so building the algorithm out of 32-bit arithmetic doesn't cost anything, but gives a big boost to low-end 32-bit CPU's that struggle with 64-bit arithmetic. The tree structure is a big win overall.

评论 #22006610 未加载

评论 #22005959 未加载

评论 #22006586 未加载

clarkmoodyover 5 years ago

Another benchmark:time openssl sha256 /tmp/bigfile<pre><code> real 0m28.160s user 0m27.750s sys 0m0.272s </code></pre> time shasum -a 256 /tmp/bigfile<pre><code> real 0m6.146s user 0m5.407s sys 0m0.560s </code></pre> time b2sum /tmp/bigfile<pre><code> real 0m1.732s user 0m1.450s sys 0m0.244s </code></pre> time b3sum /tmp/bigfile<pre><code> real 0m0.212s user 0m0.996s sys 0m0.379s </code></pre> TIL OpenSSL sha256 invocation is really slow compared to the shasum program. Also BLAKE3 is really fast.Edit: bigfile is 1GB of /dev/random

评论 #22008577 未加载

评论 #22017993 未加载

评论 #22009851 未加载

评论 #22008413 未加载

评论 #22010962 未加载

dpc_pwover 5 years ago

So this is bao + blake2?I remember watching Bao, a general purpose cryptographic tree hash, and perhaps the fastest hash function in the world: <a href="https://www.youtube.com/watch?v=Dya9c2DXMqQ" rel="nofollow">https://www.youtube.com/watch?v=Dya9c2DXMqQ</a> a while ago.Nice job!

评论 #22005596 未加载

评论 #22005472 未加载

loegover 5 years ago

Looks like they've taken Too Much Crypto to heart[1] and dropped the number of rounds from Blake2B's 12 down to 7 for Blake3:<a href="https://github.com/BLAKE3-team/BLAKE3/blob/master/reference_impl/reference_impl.rs#L83-L95" rel="nofollow">https://github.com/BLAKE3-team/BLAKE3/blob/master/reference_...</a><a href="https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref.c#L200-L211" rel="nofollow">https://github.com/BLAKE2/BLAKE2/blob/master/ref/blake2b-ref...</a>Which, yeah, that alone will get you a significant improvement over Blake2B. But definitely doesn't account for the huge improvement they're showing. Most of that is the ability to take advantage of AVX512 parallelism, I think. The difference will be more incremental on AVX2-only amd64 or other platforms, I think.[1]: Well, TMC recommended 8 rounds for Blake2B and 7 for Blake2S.

评论 #22006623 未加载

kzrdudeover 5 years ago

Just one variant, that's refreshing. And performance is impressive. What's the short input performance like? Say for 64 bytes of input.

评论 #22005309 未加载

nabla9over 5 years ago

That's impressive speedup. I just installed it and holy moly it really is fast. All those extra cores can finally get busy. ;)b3sum -l 256 big-2.6Gfile<pre><code> real 0m0.384s user 0m2.302s sys 0m0.175s </code></pre> b2sum -l 256 big-2.6Gfile<pre><code> rear 0m3.616s user 0m3.360s sys 0m0.256s </code></pre> (Intel® Core™ i7-8550U CPU @ 1.80GHz × 8 )EDIT: ah, the catch. blake3 targets 128 bit security. It competes with SipHash for speed and securityEDIT2 scratch the previous edit.

评论 #22006021 未加载

评论 #22007810 未加载

memcoover 5 years ago

I would be interested in how this compares on the smhasher against some of the other fastest hash competitors like meow hash or xxhash.

评论 #22006465 未加载

评论 #22006699 未加载

eyegorover 5 years ago

I can't seem to find any non-rust implementations in the works yet, so I may sit down and adapt the reference to C# this weekend. Anyone know how the single/few threads performance holds up excluding avx512?

评论 #22010363 未加载

评论 #22007787 未加载

评论 #22012288 未加载

rurbanover 5 years ago

smhasher results without the Rust version yet (which should be ~2x faster):<a href="http://rurban.github.io/smhasher/doc/table.html" rel="nofollow">http://rurban.github.io/smhasher/doc/table.html</a>It's of course much faster as most of the other crypto hashes, but not faster than the hardware variants of SHA1-NI and SHA256-NI. About 4x faster than blake2.Faster than SipHash, not faster than SipHash13.The tests fail on MomentChi2 dramatically, which describe how good the user-provided random seed is mixed in. I tried by mixing a seed for IV[0], as with all other hardened crypto hashes, and for all 8 IV's, which didn't help. So I'm not convinced that a seeded IV is properly mixed in. Which is outside the usage pattern of a crypto or digest hash (b3sum), but inside a normal usage.Rust staticlib is still in work, which would parallize the hashing in chunks for big keys. For small keys it should be even a bit slower. b3sum is so much faster, because it uses many more tricks, such as mmap.

bjoliover 5 years ago

Does anybody know of benchmarks for ARM? Or any research trying to break it?The numbers look astonishing.

评论 #22005325 未加载

评论 #22006735 未加载

anticensorover 5 years ago

Why are there no AES-like hashing algorithms out there? AES design is very suitable to be used as a building block in a hash if you remove "add round key" operation.

评论 #22009307 未加载

babel_over 5 years ago

Is it possible to benchmark agaist blake2 etc. but where they have the same number of rounds, testing both for reducing blake2 and also increasing blake3? Also, in that vein, offering the version with more rounds could win over the "paranoid" for mostly being a faster Blake2 thanks to SIMD and extra features thanks to the Merkle tree?

ptomatoover 5 years ago

<pre><code> Benchmark #1: cat b1 Time (mean ± σ): 1.076 s ± 0.007 s [User: 5.3 ms, System: 1069.4 ms] Range (min … max): 1.069 s … 1.093 s 10 runs Benchmark #2: sha256sum b1 Time (mean ± σ): 6.583 s ± 0.064 s [User: 5.440 s, System: 1.137 s] Range (min … max): 6.506 s … 6.695 s 10 runs Benchmark #3: sha1sum b1 Time (mean ± σ): 6.322 s ± 0.086 s [User: 5.212 s, System: 1.103 s] Range (min … max): 6.214 s … 6.484 s 10 runs Benchmark #4: b2sum b1 Time (mean ± σ): 13.184 s ± 0.108 s [User: 12.090 s, System: 1.080 s] Range (min … max): 13.087 s … 13.382 s 10 runs Benchmark #5: b3sum b1 Time (mean ± σ): 577.0 ms ± 5.4 ms [User: 12.276 s, System: 0.669 s] Range (min … max): 572.4 ms … 587.0 ms 10 runs Benchmark #6: md5sum b1 Time (mean ± σ): 14.851 s ± 0.175 s [User: 13.717 s, System: 1.117 s] Range (min … max): 14.495 s … 15.128 s 10 runs Summary 'b3sum b1' ran 1.86 ± 0.02 times faster than 'cat b1' 10.96 ± 0.18 times faster than 'sha1sum b1' 11.41 ± 0.15 times faster than 'sha256sum b1' 22.85 ± 0.28 times faster than 'b2sum b1' 25.74 ± 0.39 times faster than 'md5sum b1' </code></pre> gotdang that's some solid performance. (here running against 10GiB of random bytes; machine has the Sha ASM extensions, which is why sha256/sha1 perform so well)edit: actually not a straight algo comparison, as b3sum here is heavily benefiting from multi-threading; without that it looks more like this:<pre><code> Benchmark #1: cat b1 Time (mean ± σ): 1.090 s ± 0.007 s [User: 2.9 ms, System: 1084.8 ms] Range (min … max): 1.071 s … 1.096 s 10 runs Benchmark #2: sha256sum b1 Time (mean ± σ): 6.480 s ± 0.097 s [User: 5.359 s, System: 1.115 s] Range (min … max): 6.346 s … 6.587 s 10 runs Benchmark #3: sha1sum b1 Time (mean ± σ): 6.120 s ± 0.090 s [User: 5.027 s, System: 1.082 s] Range (min … max): 5.979 s … 6.233 s 10 runs Benchmark #4: b2sum b1 Time (mean ± σ): 12.866 s ± 0.208 s [User: 11.722 s, System: 1.133 s] Range (min … max): 12.549 s … 13.124 s 10 runs Benchmark #5: b3sum b1 Time (mean ± σ): 5.813 s ± 0.079 s [User: 4.606 s, System: 1.202 s] Range (min … max): 5.699 s … 5.933 s 10 runs Benchmark #6: md5sum b1 Time (mean ± σ): 14.355 s ± 0.184 s [User: 13.305 s, System: 1.039 s] Range (min … max): 14.119 s … 14.605 s 10 runs Summary 'cat b1' ran 5.33 ± 0.08 times faster than 'b3sum b1' 5.62 ± 0.09 times faster than 'sha1sum b1' 5.95 ± 0.10 times faster than 'sha256sum b1' 11.81 ± 0.21 times faster than 'b2sum b1' 13.17 ± 0.19 times faster than 'md5sum b1' </code></pre> still beating the dedicated sha extensions, but not nearly as dramatically.

cogman10over 5 years ago

Where is this useful?I'm guessing not for password hashes simply because a fast hash is bad for passwords (makes brute forcing/rainbow tables easier).So is this mostly just for file signing?

评论 #22004618 未加载

评论 #22004584 未加载

评论 #22004594 未加载

评论 #22004752 未加载