TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Exploring .NET Core platform intrinsics: Part 4 – Alignment and pipelining

107 点作者 benaadams将近 7 年前

2 条评论

zvrba将近 7 年前
Further optimization potential: the four lines<p><pre><code> sum = Avx2.Add(block0, sum); sum = Avx2.Add(block1, sum); sum = Avx2.Add(block2, sum); sum = Avx2.Add(block3, sum); </code></pre> have all a serializing dependency on sum variable. But (integer) addition is associative and commutative, so you could sum it in a tree-like manner, ending up only with a a single serializing dependency:<p><pre><code> sum01 = Avx.Add(block0, block1); sum23 = Avx.Add(block2, block3); &#x2F;&#x2F; These two run in parallel sum = Avx.Add(sum, sum01); &#x2F;&#x2F; sum01 hopefully ready; parallel with sum23 sum = Avx.Add(sum, sum23); &#x2F;&#x2F; sum23 hopefully ready </code></pre> Where only the last line serializes with the previous one. Maybe the HW is smart enough to rename the registers and do the same thing internally, but it&#x27;d be interesting to benchmark it.
评论 #17585832 未加载
rossnordby将近 7 年前
Seeing the intrinsics APIs get filled out- in the open, no less- has been pretty exciting. The fact that something like AES would be implemented competitively in C# is not something I would have predicted even five years ago.<p>It&#x27;s remarkable how fast the language and runtime have evolved for performance. It wasn&#x27;t that long ago that I was manually inlining Vector3 operators to try to get a few extra cycles out of XNA on the Xbox360.
评论 #17590881 未加载