科技回声

3 条评论

kevmo314超过 1 年前

Related anecdote, a coworker suggested I use <a href="https://pkg.go.dev/math#FMA" rel="nofollow">https://pkg.go.dev/math#FMA</a> to optimize a multiply and add which surprised me quite a bit: why would there be an opt-in to fused multiply and add? Indeed, if you dive into the code (<a href="https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/math/fma.go;l=95" rel="nofollow">https://cs.opensource.google/go/go/+/refs/tags/go1.21.5:src/...</a>) it's quite a bit more complicated than your normal a*x+b syntax, so how could this possibly yield a performance improvement?It turns out, with some more research (<a href="https://github.com/golang/go/issues/25819">https://github.com/golang/go/issues/25819</a>), that the function was added not to guarantee performance but to guarantee precision, namely that fused mutiply and add yields higher precision than doing the operations stepwise and in certain situations you'd like to guarantee precision. Which is cool, but absolutely not what I would've guessed on first read, and the first commenter also closed the issue with the same take!So I was able to successfully counterpoint using math.FMA() as a performance optimization and maybe a small personal takeaway to not optimize unless I really know what the thing is doing.

评论 #38918296 未加载

wahern超过 1 年前

AFAIU, LL/SC is the more generic, powerful primitive. In theory LL/SC can be used as the hardware primitive for a much broader range of lock-free algorithms, as well as for software transactional memory generally. CAS algorithms are more commonly seen because it's the lowest common denominator, and the best x86 offered. But because of the limited number of addresses that can be monitored in hardware without sacrificing performance or efficiency, in practice LL/SC implementations are weak and only slightly more useful than [double] CAS.

perryizgr8超过 1 年前

// Check support for LSE atomicsMOVBU internal∕cpu·ARM64+const_offsetARM64HasATOMICS(SB), R4CBZ R4, load_store_loopWhy is this a runtime decision? Shouldn't the compiler know if the target machine supports the instruction or not?

TIL: Go's CompareAndSwap is not always Compare-and-swap

3 条评论

TIL: Go's CompareAndSwap is not always Compare-and-swap

3 条评论