科技回声

4 条评论

wmu超过 5 年前

The hand-coded AVX2 procedure is far from optimal form, they waste time on horizontal addition in each iteration.The First Rule of SIMD-ization says: keep all the intermediate results in vector(s), do horizontal reduction at the end.Conversion from comparison result into vector of integers can be done a bit simpler: just one bit-and is needed and then cast to __m256i (casting doesn't emit any code as SIMD registers are untyped).

truth_seeker超过 5 年前

If someone is seeking more insight into it, follow this link:<a href="https://www.slideshare.net/IntelSoftware/simple-single-instruction-multiple-data-simd-with-the-intel-implicit-spmd-program-compiler-intel-ispc" rel="nofollow">https://www.slideshare.net/IntelSoftware/simple-single-instr...</a>

Const-me超过 5 年前

> if you want to target multiple ISAs, you need to write multiple algorithmsIn my experience, these algorithms are similar to each other. More often than not don't require too much extra time: a few macros here and there, a few templates, couple version of a small low-level function, etc.> _mm256_hadd_epi32That instruction is slow, e.g. on Ryzen it has latency 7. _mm256_slli_si256 and bitwise ops have latency 1, often can do same faster.> readability is reduced when compared to the original scalar implementationSolvable with a library, example: <a href="https://github.com/Const-me/IntelIntrinsics" rel="nofollow">https://github.com/Const-me/IntelIntrinsics</a>

KenanSulayman超过 5 年前

Very interesting and I can't wait to try it out.It's a pity though, based on what I understood from the website, that it's only producing binaries which can be linked to other than actually generating C / C++ code. That would be great for LTO, but also allow for better inspection of the generated SIMD code prior to compilation to ensure that all code is compiled by the same compiler. I guess the best way to inspect the artefacts prior to assembly is via LLVM IR.I'm pretty happy that Intel chose to implement this based on LLVM. I'd have expected this to be sitting on top of icc.

评论 #21136586 未加载

4 条评论

wmu超过 5 年前

truth_seeker超过 5 年前

Const-me超过 5 年前

KenanSulayman超过 5 年前

评论 #21136586 未加载

SIMD Made Easy with Intel Implicit SPMD Program Compiler

4 条评论

SIMD Made Easy with Intel Implicit SPMD Program Compiler

4 条评论