TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Fast Inverse Square Root

190 pointsby bertzzieover 12 years ago

13 comments

ColinWrightover 12 years ago
In case anyone is interested in reading other accounts and the HN discussions that accompany them, here are a few previous submissions of other articles:<p><a href="http://news.ycombinator.com/item?id=213056" rel="nofollow">http://news.ycombinator.com/item?id=213056</a> &#60;- Some comments<p><a href="http://news.ycombinator.com/item?id=419166" rel="nofollow">http://news.ycombinator.com/item?id=419166</a> &#60;- Some comments<p><a href="http://news.ycombinator.com/item?id=573912" rel="nofollow">http://news.ycombinator.com/item?id=573912</a><p><a href="http://news.ycombinator.com/item?id=896092" rel="nofollow">http://news.ycombinator.com/item?id=896092</a> &#60;- Some comments<p><a href="http://news.ycombinator.com/item?id=1599635" rel="nofollow">http://news.ycombinator.com/item?id=1599635</a><p><a href="http://news.ycombinator.com/item?id=2332793" rel="nofollow">http://news.ycombinator.com/item?id=2332793</a> &#60;- Some comments<p><a href="http://news.ycombinator.com/item?id=3115168" rel="nofollow">http://news.ycombinator.com/item?id=3115168</a> &#60;- Some comments<p><a href="http://news.ycombinator.com/item?id=3259199" rel="nofollow">http://news.ycombinator.com/item?id=3259199</a>
psykoticover 12 years ago
I remember re-deriving this a few years ago. Short answer: If x = 2^n then rsqrt(x) = 2^(-n/2), so the exponent just gets negated and divided by 2. If x = (1 + m) 2^n where |m| &#60; 1 is the fractional part of the mantissa, then rsqrt(1 + m) = 1 + rsqrt'(1) m + O(m^2) = 1 - m/2 + O(m^2) is the first-order Taylor expansion around 1. Thus rsqrt(x) ~= (1 - m/2) 2^(-n/2) so both m and n get negated and divided by 2.<p>This immediately suggests getting an initial guess for Newton's method by just shifting the bitwise form of an IEEE-754 floating point number by 1 and then doing some fix-ups for the negation.<p>The only hitch is that the exponent is stored in biased form rather than 2's complement form, so you have to first subtract out the bias, do the shift, and add back in the bias. Also, shifting can cause a least-significant 1 bit from the exponent to shift down into the most significant bit of the mantissa. But that turns out to be useful since it amounts to using the approximation sqrt(2) ~= 1.5. But doing all these bitwise operations is a lot of extra work compared to just doing a single shift. It turns out you can get almost the same effect with just a single addition of an immediate operand. Most of the bits of this addend are already predetermined by the need to handle the biasing and unbiasing in the exponent, and the remainder can easily be determined by brute force (the search space is only of size 2^24 or so) by just minimizing the approximation error over some test set. Doing that, I was able to converge to almost exactly the same 'magic number' as in Carmack's method.<p>My coworker Charles Bloom has a series of posts on this and related matters: <a href="http://cbloomrants.blogspot.com/2010/11/11-20-10-function-approximation-by_20.html" rel="nofollow">http://cbloomrants.blogspot.com/2010/11/11-20-10-function-ap...</a><p>Here's an old email exchange I had with another coworker last time I was thinking about this stuff, where I tried to derive as many bits of the constant as possible by reasoning: <a href="https://gist.github.com/4e1d0dff97193fe5d745" rel="nofollow">https://gist.github.com/4e1d0dff97193fe5d745</a><p>Historical aside: How this algorithm came to be associated with Carmack is an amusing and roundabout tale. My partner in crime on Telemetry at RAD is Brian Hook. Hook was one of the first employees at 3dfx, where he designed and implemented the GLIDE graphics API. Gary Tarolli was the co-founder and CTO at 3dfx, and Hook got the rsqrt code from him. When Hook left 3dfx to work with Carmack on Quake 2 and 3 at id software, he brought that code with him and copied it into the codebase. Carmack never had anything to do with it. It's my recollection Tarolli didn't develop the algorithm either--he had gotten it from someone else in turn.
评论 #4527574 未加载
评论 #4527667 未加载
jimminyover 12 years ago
I've always liked the post provided by BetterExplained[0], but this article is more thorough while remaining simple. Would probably still use the BetterExplained post for a first introduction, I think.<p>[0]:<a href="http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/" rel="nofollow">http://betterexplained.com/articles/understanding-quakes-fas...</a>
boasover 12 years ago
This line:<p>&#62; x = <i>(float</i>)&#38;i;<p>breaks strict aliasing, and has undefined behavior. If you compile it with g++ -O2, you may get unexpected results. The solution is to use a union, or to compile with -no-strict-aliasing.<p>I used to have a lot of code like this, which worked fine on older compilers, but not on newer compilers. It's perfectly reasonable code, so I'm not sure why the newer compilers don't produce the expected behavior by default.
评论 #4527636 未加载
评论 #4527957 未加载
prezjordanover 12 years ago
Finally an article that really breaks it down for someone like me (college calculus knowledge) to understand. Thanks for sharing this, super interesting.
jgeralnikover 12 years ago
If anybody wants to further play around with this, I wrote the following script in python using matplotlib:<p><pre><code> import numpy import matplotlib.pyplot as plt import struct def power(x,p): i = struct.unpack("i", struct.pack("f", x))[0] i = (1-p) * 0x3f7a3bea + (p*i) return struct.unpack("f", struct.pack("i", i))[0] p = -0.5 xs = numpy.arange(0.1,100,0.1) ys1 = map(lambda x: x**p, xs) ys2 = map(lambda x: power(x, p), xs) plt.plot(xs, ys1, label="Real function") plt.plot(xs, ys2, label="Approximation") plt.title("Graph of x^%d"%p) plt.legend(loc="upper left") plt.savefig("graph.png") </code></pre> Change p to whatever you want (even values greater than 1) to see different powers.
smoyerover 12 years ago
A long time ago (the late '80s or early '90s) I needed a square root routine for a floating point number and found an article by a Jack Crenshaw (a real rocket scientist), that always converged in less than 5 iterations. First you normalized your FP number (converted the mantissa to 1.xxxxx) while adjusting the exponent and then multiplied by 1.38 which gave a surprisingly close answer.<p>I'm guessing that (if I could find the article) the constant used in this floating point technique and the one used in this article are somehow inversely related too!<p>In any case, it's a very nice article ... thanks!
评论 #4528793 未加载
IgniteTheSunover 12 years ago
Not to be too pedantic, but this is the reciprocal of the square root. (The inverse of the square root operation would be the square operation.)
评论 #4527966 未加载
angersockover 12 years ago
There was a nifty article on Flipcode long ago ( <a href="http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml" rel="nofollow">http://www.flipcode.com/archives/Fast_Approximate_Distance_F...</a> ) in a similar vein.
nicholassmithover 12 years ago
I remember seeing the article about the history of this pop up and when it started getting into the maths my brain wasn't keen, but this is a really simple and great explanation of even the heavier aspects of the math involved.
SeanLukeover 12 years ago
If only there were an equivalent fast atan2.
k2xlover 12 years ago
i've seen this multiple times in the past few years, each time i've seen it i've understood just a tiny bit more... i would love to see an interview with the original author of this...
评论 #4527389 未加载
FredBrachover 12 years ago
<i>why?</i><p>Also because x * 1/sqrt(x) == sqrt(x) (so on a pentium, an sqrt was 3 cycles further an inv-sqrt, pretty cheap for an sqrt)