TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bit twiddling with Arm Neon: beating SSE movemasks, counting bits and more

96 pointsby danlarkover 2 years ago

6 comments

olliejover 2 years ago
This is a really interesting article. I was expecting some obviously biased and&#x2F;or marketing horror by virtue of it being on arm.com<p>It’s actually an interesting breakdown of ways NEON differs from SSE, and how a “direct” translation may well be sub optimal. Their first example is really illustrative of this. SSE has an instruction that pulls the top(I think?) but of each register and creates an 8bit mask from those. You can do similar in NEON but the perf is apparently terrible. But NEON has an instruction that packs some bits from each register into a 64bit value, and you can go from that to the masking behaviour you were presumably trying for originally, but much faster.<p>The other examples and case studies are similarly interesting.
评论 #32646512 未加载
zX41ZdbWover 2 years ago
It improves string comparison and sorting in ClickHouse by 15%: <a href="https:&#x2F;&#x2F;github.com&#x2F;ClickHouse&#x2F;ClickHouse&#x2F;pull&#x2F;38093" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ClickHouse&#x2F;ClickHouse&#x2F;pull&#x2F;38093</a>
alas44over 2 years ago
Really interesting, thanks for sharing<p>From the article also, 10-20% improvement (I guess in Instructions Per Cycle) on some str methods in glibc <a href="https:&#x2F;&#x2F;sourceware.org&#x2F;git&#x2F;?p=glibc.git;a=commit;h=3c9980698988ef64072f1fac339b180f52792faf" rel="nofollow">https:&#x2F;&#x2F;sourceware.org&#x2F;git&#x2F;?p=glibc.git;a=commit;h=3c9980698...</a>
david2ndaccountover 2 years ago
Great article, applied it to my parser where I was emulating movemask and it did indeed speed it up a few percent.
terrellnover 2 years ago
Awesome work! We were very happy to receive the patches to zstd to optimize ARM performance!
dqhover 2 years ago
Can anyone recommend a good practical learning resource on adding vector optimisations to C code?<p>We could use some further optimisation in the emulated screen rendering code in VICE, particularly on ARM.