TechEcho

5 comments

For this use-case, you can squeeze out even more performance by using the SHA-1 implementation in Intel ISA-L Crypto [1]. The SHA-1 implementation there allows for multi-buffer hashes, giving you the ability to calculate the hashes for multiple chunks in parallel on a single core. Given that that is basically your usecase, it might be worth considering. I doubt it'll provide much speedup if you're already I/O bound here though.<p>[1]: <a href="https://github.com/intel/isa-l_crypto">https://github.com/intel/isa-l_crypto</a>

评论 #43096178 未加载

评论 #43095314 未加载

ack_complete3 months ago

SHA1 is difficult to vectorize due to a tight loop-carried dependency in the main operation. In an optimized build, I've only seen about a 15% speedup over the scalar version with x64 SSSE3 without hardware SHA1 support. Debug builds of course can benefit more from the reduction in operations since the inefficient code generation is a bigger issue there than the dependency chains. I think the performance delta is bigger for ARM64 CPUs, but it's pretty rare to not have the Crypto extension (except notably some Raspberry Pi models).<p>The comments in the SSE2 version are a bit odd as it references MMX, and the Pentium M and Efficeon CPUs. Those CPUs are <i>ancient</i> -- 2003/2004 era. The vectorized code you have also uses SSE2 and not MMX, which is important since SSE2 is double the width and has different performance characteristics from MMX. IIRC, Intel CPUs didn't start supporting SHA until ~2019 with Ice Lake, so the target for non-hardware-accelerated vectorized SHA1 for Intel CPUs would be mostly Skylake-based.

bean-weevil3 months ago

Why not just compile that particular object with optimizations on and the rest of the file with optimizations off?

评论 #43090556 未加载

评论 #43094257 未加载

tvbusy3 months ago

I understand the post is about learning to speed up SHA1 calculation, that I have no comment. However, the state file is a solved problem for me. It's a rare case where state files are corrupted and it's simple to just re-check the file. I cannot imagine a torrent client checking the hash of TBs of files for every single start. It's not a coincidence that many torrent clients have a feature to skip hash checking and just immediately assume the file is correct and start seeding immediately.

Neywiny3 months ago

If I were a betting person I'd bet that the sha1 instructions and the openssl instruction map to similar enough uops. Unsure if there's a way to check, but that's my understanding of the thousands of instructions in modern processors - mostly just assigning names to common patterns.

5 comments

molenzwiebel3 months ago

评论 #43096178 未加载

评论 #43095314 未加载

ack_complete3 months ago

bean-weevil3 months ago

Why not just compile that particular object with optimizations on and the rest of the file with optimizations off?

评论 #43090556 未加载

评论 #43094257 未加载

tvbusy3 months ago

Neywiny3 months ago

Making my debug build run 100x faster so that it is finally usable

5 comments

Making my debug build run 100x faster so that it is finally usable

5 comments