A recent blog post by Vlad Krasnov, author of a bunch of the crypto assembly code in openssl and in golang, about frequency scaling when using AVX-512 making it not worth it: <a href="https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/" rel="nofollow">https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...</a><p>He doesn't like the title of the OP and provided links:<p>> Very misleading title. Could just as well name it "accelerate sha256 up to 134x". You need to compare apples to apples. If AVX2 was used in the same way AVX512 is used, the speedup would be 2X at most. Reminds me of two of my papers <a href="https://eprint.iacr.org/2012/371.pdf" rel="nofollow">https://eprint.iacr.org/2012/371.pdf</a> <a href="https://eprint.iacr.org/2012/067.pdf" rel="nofollow">https://eprint.iacr.org/2012/067.pdf</a><p>(from <a href="https://twitter.com/thecomp1ler/status/940724783804645376" rel="nofollow">https://twitter.com/thecomp1ler/status/940724783804645376</a>)<p>EDIT: Thanks 'delhanty !
This is assembly, not pure Go, but it doesn't use CGO which I probably what they mean.<p>Intel Cannon Lake processors will support the SHA instruction extensions (currently available only on Goldmont). It will be interesting to see how that compares with this approach of running 16 SHA computations in parallel. You would be able to get rid of the scheduling overhead of having to first queue up 16 SHA calculations from other threads.
One thing to note is that the benchmark is running on a Skylake Platinum chip which has two AVX512 FMAs.<p>You need a Gold 6000 series and above to see any benefit from AVX512. In most other cases the CPU throttles down some insane amount and there’s no to little benefit.
I blogged about the SHA instruction support in the x86_64 ISA a few months back, it’ll be nice to see it actually happen: <a href="https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring-sha-extensions-to-intels-cpus/" rel="nofollow">https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring...</a>
Isn't this the kind of thing that was missing from the "go on different platforms" benchmark a little while back. The intel platform has crazy optimization for encryption algorithms on Inteil, while ARM was severely lacking.