TechEcho

6 comments

fulafelover 3 years ago

It's interesting that SIMD came to the mainstream 25 years ago and compilers, our apps and PL tech are still quite far from effectively utilizing it outside nonportable manually coded SIMD aware compute kernels in glorified assembler. There are some exceptions like ispc and GPU languages like OpenCL and Futhark (GPU people say "cores" when they mean SIMD lanes!)...

评论 #30312860 未加载

评论 #30320840 未加载

评论 #30316401 未加载

评论 #30312650 未加载

udbhavsover 3 years ago

The whole book looks very full of interesting low level techniques. Do high level JIT compilers like the V8 or JVM apply vectorisation or any of the other optimisations mentioned there, or is that level of fine-tuned performance only possible if you code them manually in C++?

评论 #30315309 未加载

评论 #30312366 未加载

评论 #30314237 未加载

评论 #30312949 未加载

JimBlackwoodover 3 years ago

This book is fantastic - it comes at a perfect time too, where I’m getting more work related projects about ‘make slow code fast’.<p>Going to be reading this - and looking forward to any future parts!

mhkoolover 3 years ago

Since the performance for array sizes <L1-size and <L2-size is similar , I would like to see an attempt to improve B. B = L2-size / 2 / sizeof(int) - 16 might produce better results.<p>Note also that _mm_broadcast_ss() is faster on newer processors.

gfdover 3 years ago

I've seen the author advertise his book on codeforces.com blogs before if you want somewhere to reach him: <a href="https://codeforces.com/blog/entry/99790" rel="nofollow">https://codeforces.com/blog/entry/99790</a><p>That might be a better intro than a random chapter of the book and contextualizes why you might want to learn SIMD programming (i.e., up to an order of magnitude speed-up vs STL implementations).

评论 #30312771 未加载

NohatCoderover 3 years ago

I don't get why the baseline is so slow, a read, a write and an add per clock should all be within the capabilities of a modern processor.

评论 #30314713 未加载

评论 #30314509 未加载

6 comments

fulafelover 3 years ago

评论 #30312860 未加载

评论 #30320840 未加载

评论 #30316401 未加载

评论 #30312650 未加载

udbhavsover 3 years ago

评论 #30315309 未加载

评论 #30312366 未加载

评论 #30314237 未加载

评论 #30312949 未加载

JimBlackwoodover 3 years ago

mhkoolover 3 years ago

gfdover 3 years ago

评论 #30312771 未加载

NohatCoderover 3 years ago

I don't get why the baseline is so slow, a read, a write and an add per clock should all be within the capabilities of a modern processor.

评论 #30314713 未加载

评论 #30314509 未加载

Prefix Sum with SIMD

6 comments

Prefix Sum with SIMD

6 comments