TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Revisiting the Fast Inverse Square Root – Is It Still Useful?

145 pointsby nhellmanabout 2 years ago

12 comments

thegeomasterabout 2 years ago
One important point that the article doesn't touch on is determinism. rsqrtps is implemented differently on different CPUs, and so if having reproducible floating point results is a requirement (there's a lot of use-cases for this), you simply cannot use it. Your only remaining option is to use a totally IEEE-754 compliant algorithm that is guaranteed to work the same on every CPU that implements IEEE-754 floats, and for that there's still no better approach than using the Q_rsqrt idea, of course with some modifications for the "modern age".
评论 #35648603 未加载
评论 #35648057 未加载
评论 #35661072 未加载
评论 #35647807 未加载
评论 #35648890 未加载
评论 #35648734 未加载
bee_riderabout 2 years ago
I wonder to what extent the Newton-Rhapson strategy plays nicer with big fancy reordering&#x2F;pipelining&#x2F;superscalar chips. It has more little instructions to shuffle around, so my gut says it should be beneficial, but the gut can be misleading for this kind of stuff.<p>Also,<p>-funsafe-math-optimizations<p>Fun, safe math optimizations should be turned on by default! ;)
评论 #35648909 未加载
stephc_int13about 2 years ago
The benchmark should not average the values but take the lowest.<p>I would not write a better explanation than Daniel Lemire on his blog:<p><a href="https:&#x2F;&#x2F;lemire.me&#x2F;blog&#x2F;2023&#x2F;04&#x2F;06&#x2F;are-your-memory-bound-benchmarking-timings-normally-distributed&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lemire.me&#x2F;blog&#x2F;2023&#x2F;04&#x2F;06&#x2F;are-your-memory-bound-benc...</a>
评论 #35649522 未加载
评论 #35650186 未加载
danieldkabout 2 years ago
A colleague and I were once discussing the fast inverse square root and joked that we need to make a (neural net) activation function that uses an inverse square root as an excuse to use the fast inverse square root. At any rate, I did come up with an activation function that is very similar to Swish&#x2F;GELU but uses an inverse square root:<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;danieldekok&#x2F;status&#x2F;1484898130441166853?s=61&amp;t=D_7PZsTJCaa6-HzOReAhwg" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danieldekok&#x2F;status&#x2F;1484898130441166853?s...</a><p>It&#x27;s quite a bit cheaper, because it doesn&#x27;t need expensive elementary functions like exp or erf.<p>(We did add it to Thinc: <a href="https:&#x2F;&#x2F;thinc.ai&#x2F;docs&#x2F;api-layers#dish" rel="nofollow">https:&#x2F;&#x2F;thinc.ai&#x2F;docs&#x2F;api-layers#dish</a>)
评论 #35650576 未加载
jablabout 2 years ago
It&#x27;s a shame that -fno-math-errno isn&#x27;t the default. It pessimizes many common operations, as can be seen in the article. Also e.g. a simple sqrt() call like<p>#include &lt;math.h&gt;<p>double mysqrt(double d) { return sqrt(d); }<p>with and without -fno-math-errno: <a href="https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;bvrz9r8ce" rel="nofollow">https:&#x2F;&#x2F;godbolt.org&#x2F;z&#x2F;bvrz9r8ce</a><p>One can see that with -fno-math-errno the function can be entirely inlined. But if errno is enabled, it has to first check whether the input is negative, and in that case call the libc sqrt() function which sets errno.<p>As for why it&#x27;s not the default, I guess it&#x27;s historical. The errno approach was common back in the days before IEEE 754 with its exception model provided another way.<p>E.g. for glibc: <a href="https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man7&#x2F;math_error.7.html" rel="nofollow">https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man7&#x2F;math_error.7.html</a><p>Musl libc, being newer, does away with that and never sets errno in libm functions: <a href="https:&#x2F;&#x2F;wiki.musl-libc.org&#x2F;mathematical-library.html" rel="nofollow">https:&#x2F;&#x2F;wiki.musl-libc.org&#x2F;mathematical-library.html</a>
commandlinefanabout 2 years ago
I&#x27;ve always wondered why they did the casting rather than a union like:<p><pre><code> float my_rsqrt( float number ) { float x2; union { float y; long i; } u; const float threehalfs = 1.5F; x2 = number * 0.5F; u.y = number; u.i = 0x5f3759df - ( u.i &gt;&gt; 1 ); &#x2F;&#x2F; what the fuck? u.y = u.y * ( threehalfs - ( x2 * u.y * u.y ) ); &#x2F;&#x2F; 1st iteration return u.y; } </code></pre> Were unions not supported by the compilers back then?
评论 #35658707 未加载
geertjabout 2 years ago
Awesome write up, a lot of effort must have gone into this.<p>I believe the benchmark program outputs the wrong units? It should be picoseconds (ps) instead of femtoseconds (fs)?
评论 #35655335 未加载
jablabout 2 years ago
As a small nit on the benchmark code, should use CLOCK_MONOTONIC rather than CLOCK_REALTIME.
clircleabout 2 years ago
Inverse square root and reciprocal square root are not the same. Inverse square root means x^2, not 1&#x2F;sqrt(x)
mikerg87about 2 years ago
Anyone have an idea which one is more power efficient? Is there a tool that could help make that determination ?
seventytwoabout 2 years ago
Hell of a write up! Nice work.
评论 #35664539 未加载
nikanjabout 2 years ago
This would never pass the code review today. &quot;Why are you optimizing this?&quot; &quot;Why use a magic constant?&quot; &quot;Optimization is evil!&quot;