TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

TIL: Go's CompareAndSwap is not always Compare-and-swap

22 点作者 enz超过 1 年前

3 条评论

kevmo314超过 1 年前
Related anecdote, a coworker suggested I use <a href="https:&#x2F;&#x2F;pkg.go.dev&#x2F;math#FMA" rel="nofollow">https:&#x2F;&#x2F;pkg.go.dev&#x2F;math#FMA</a> to optimize a multiply and add which surprised me quite a bit: why would there be an opt-in to fused multiply and add? Indeed, if you dive into the code (<a href="https:&#x2F;&#x2F;cs.opensource.google&#x2F;go&#x2F;go&#x2F;+&#x2F;refs&#x2F;tags&#x2F;go1.21.5:src&#x2F;math&#x2F;fma.go;l=95" rel="nofollow">https:&#x2F;&#x2F;cs.opensource.google&#x2F;go&#x2F;go&#x2F;+&#x2F;refs&#x2F;tags&#x2F;go1.21.5:src&#x2F;...</a>) it&#x27;s quite a bit more complicated than your normal a*x+b syntax, so how could this possibly yield a performance improvement?<p>It turns out, with some more research (<a href="https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;issues&#x2F;25819">https:&#x2F;&#x2F;github.com&#x2F;golang&#x2F;go&#x2F;issues&#x2F;25819</a>), that the function was added not to guarantee performance but to guarantee <i>precision</i>, namely that fused mutiply and add yields higher precision than doing the operations stepwise and in certain situations you&#x27;d like to guarantee precision. Which is cool, but absolutely not what I would&#x27;ve guessed on first read, and the first commenter also closed the issue with the same take!<p>So I was able to successfully counterpoint using math.FMA() as a performance optimization and maybe a small personal takeaway to not optimize unless I really know what the thing is doing.
评论 #38918296 未加载
wahern超过 1 年前
AFAIU, LL&#x2F;SC is the more generic, powerful primitive. In theory LL&#x2F;SC can be used as the hardware primitive for a much broader range of lock-free algorithms, as well as for software transactional memory generally. CAS algorithms are more commonly seen because it&#x27;s the lowest common denominator, and the best x86 offered. But because of the limited number of addresses that can be monitored in hardware without sacrificing performance or efficiency, in practice LL&#x2F;SC implementations are weak and only slightly more useful than [double] CAS.
perryizgr8超过 1 年前
&#x2F;&#x2F; Check support for LSE atomics<p>MOVBU internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4<p>CBZ R4, load_store_loop<p>Why is this a runtime decision? Shouldn&#x27;t the compiler know if the target machine supports the instruction or not?
评论 #38919619 未加载