Blake3 is a clear winner for large inputs.<p>However, for smaller inputs (~1024 bytes and down), the performance gap between it and everything else (blake2, sha256) gets much narrower, because you don't get to benefit from the structural parallelization.<p>If you're mostly dealing with small inputs, raw hash throughput is <i>probably</i> not high on your list of concerns - In the context of a protocol or application, other costs like IO latency probably completely dwarf the actual CPU time spent hashing.<p>If raw performance is no longer high on your list of priorities, you care more about the other things - ubiquitous and battle-tested library support (blake3 is still pretty bleeding-edge, in the grand scheme of things), FIPS compliance (sha256), greater on-paper security margin (blake2). Which is all to say, while blake3 <i>is</i> great, there are still plenty of reasons not to prefer it for a particular use-case.