TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Prefix Sum with SIMD

112 pointsby g0xA52A2Aover 3 years ago

6 comments

fulafelover 3 years ago
It's interesting that SIMD came to the mainstream 25 years ago and compilers, our apps and PL tech are still quite far from effectively utilizing it outside nonportable manually coded SIMD aware compute kernels in glorified assembler. There are some exceptions like ispc and GPU languages like OpenCL and Futhark (GPU people say "cores" when they mean SIMD lanes!)...
评论 #30312860 未加载
评论 #30320840 未加载
评论 #30316401 未加载
评论 #30312650 未加载
udbhavsover 3 years ago
The whole book looks very full of interesting low level techniques. Do high level JIT compilers like the V8 or JVM apply vectorisation or any of the other optimisations mentioned there, or is that level of fine-tuned performance only possible if you code them manually in C++?
评论 #30315309 未加载
评论 #30312366 未加载
评论 #30314237 未加载
评论 #30312949 未加载
JimBlackwoodover 3 years ago
This book is fantastic - it comes at a perfect time too, where I’m getting more work related projects about ‘make slow code fast’.<p>Going to be reading this - and looking forward to any future parts!
mhkoolover 3 years ago
Since the performance for array sizes &lt;L1-size and &lt;L2-size is similar , I would like to see an attempt to improve B. B = L2-size &#x2F; 2 &#x2F; sizeof(int) - 16 might produce better results.<p>Note also that _mm_broadcast_ss() is faster on newer processors.
gfdover 3 years ago
I&#x27;ve seen the author advertise his book on codeforces.com blogs before if you want somewhere to reach him: <a href="https:&#x2F;&#x2F;codeforces.com&#x2F;blog&#x2F;entry&#x2F;99790" rel="nofollow">https:&#x2F;&#x2F;codeforces.com&#x2F;blog&#x2F;entry&#x2F;99790</a><p>That might be a better intro than a random chapter of the book and contextualizes why you might want to learn SIMD programming (i.e., up to an order of magnitude speed-up vs STL implementations).
评论 #30312771 未加载
NohatCoderover 3 years ago
I don&#x27;t get why the baseline is so slow, a read, a write and an add per clock should all be within the capabilities of a modern processor.
评论 #30314713 未加载
评论 #30314509 未加载