In case anyone is interested in reading other accounts and the HN discussions that accompany them, here are a few previous submissions of other articles:<p><a href="http://news.ycombinator.com/item?id=213056" rel="nofollow">http://news.ycombinator.com/item?id=213056</a> <- Some comments<p><a href="http://news.ycombinator.com/item?id=419166" rel="nofollow">http://news.ycombinator.com/item?id=419166</a> <- Some comments<p><a href="http://news.ycombinator.com/item?id=573912" rel="nofollow">http://news.ycombinator.com/item?id=573912</a><p><a href="http://news.ycombinator.com/item?id=896092" rel="nofollow">http://news.ycombinator.com/item?id=896092</a> <- Some comments<p><a href="http://news.ycombinator.com/item?id=1599635" rel="nofollow">http://news.ycombinator.com/item?id=1599635</a><p><a href="http://news.ycombinator.com/item?id=2332793" rel="nofollow">http://news.ycombinator.com/item?id=2332793</a> <- Some comments<p><a href="http://news.ycombinator.com/item?id=3115168" rel="nofollow">http://news.ycombinator.com/item?id=3115168</a> <- Some comments<p><a href="http://news.ycombinator.com/item?id=3259199" rel="nofollow">http://news.ycombinator.com/item?id=3259199</a>
I remember re-deriving this a few years ago. Short answer: If x = 2^n then rsqrt(x) = 2^(-n/2), so the exponent just gets negated and divided by 2. If x = (1 + m) 2^n where |m| < 1 is the fractional part of the mantissa, then rsqrt(1 + m) = 1 + rsqrt'(1) m + O(m^2) = 1 - m/2 + O(m^2) is the first-order Taylor expansion around 1. Thus rsqrt(x) ~= (1 - m/2) 2^(-n/2) so both m and n get negated and divided by 2.<p>This immediately suggests getting an initial guess for Newton's method by just shifting the bitwise form of an IEEE-754 floating point number by 1 and then doing some fix-ups for the negation.<p>The only hitch is that the exponent is stored in biased form rather than 2's complement form, so you have to first subtract out the bias, do the shift, and add back in the bias. Also, shifting can cause a least-significant 1 bit from the exponent to shift down into the most significant bit of the mantissa. But that turns out to be useful since it amounts to using the approximation sqrt(2) ~= 1.5. But doing all these bitwise operations is a lot of extra work compared to just doing a single shift. It turns out you can get almost the same effect with just a single addition of an immediate operand. Most of the bits of this addend are already predetermined by the need to handle the biasing and unbiasing in the exponent, and the remainder can easily be determined by brute force (the search space is only of size 2^24 or so) by just minimizing the approximation error over some test set. Doing that, I was able to converge to almost exactly the same 'magic number' as in Carmack's method.<p>My coworker Charles Bloom has a series of posts on this and related matters: <a href="http://cbloomrants.blogspot.com/2010/11/11-20-10-function-approximation-by_20.html" rel="nofollow">http://cbloomrants.blogspot.com/2010/11/11-20-10-function-ap...</a><p>Here's an old email exchange I had with another coworker last time I was thinking about this stuff, where I tried to derive as many bits of the constant as possible by reasoning: <a href="https://gist.github.com/4e1d0dff97193fe5d745" rel="nofollow">https://gist.github.com/4e1d0dff97193fe5d745</a><p>Historical aside: How this algorithm came to be associated with Carmack is an amusing and roundabout tale. My partner in crime on Telemetry at RAD is Brian Hook. Hook was one of the first employees at 3dfx, where he designed and implemented the GLIDE graphics API. Gary Tarolli was the co-founder and CTO at 3dfx, and Hook got the rsqrt code from him. When Hook left 3dfx to work with Carmack on Quake 2 and 3 at id software, he brought that code with him and copied it into the codebase. Carmack never had anything to do with it. It's my recollection Tarolli didn't develop the algorithm either--he had gotten it from someone else in turn.
I've always liked the post provided by BetterExplained[0], but this article is more thorough while remaining simple. Would probably still use the BetterExplained post for a first introduction, I think.<p>[0]:<a href="http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/" rel="nofollow">http://betterexplained.com/articles/understanding-quakes-fas...</a>
This line:<p>> x = <i>(float</i>)&i;<p>breaks strict aliasing, and has undefined behavior. If you compile it with g++ -O2, you may get unexpected results. The solution is to use a union, or to compile with -no-strict-aliasing.<p>I used to have a lot of code like this, which worked fine on older compilers, but not on newer compilers. It's perfectly reasonable code, so I'm not sure why the newer compilers don't produce the expected behavior by default.
Finally an article that really breaks it down for someone like me (college calculus knowledge) to understand. Thanks for sharing this, super interesting.
If anybody wants to further play around with this, I wrote the following script in python using matplotlib:<p><pre><code> import numpy
import matplotlib.pyplot as plt
import struct
def power(x,p):
i = struct.unpack("i", struct.pack("f", x))[0]
i = (1-p) * 0x3f7a3bea + (p*i)
return struct.unpack("f", struct.pack("i", i))[0]
p = -0.5
xs = numpy.arange(0.1,100,0.1)
ys1 = map(lambda x: x**p, xs)
ys2 = map(lambda x: power(x, p), xs)
plt.plot(xs, ys1, label="Real function")
plt.plot(xs, ys2, label="Approximation")
plt.title("Graph of x^%d"%p)
plt.legend(loc="upper left")
plt.savefig("graph.png")
</code></pre>
Change p to whatever you want (even values greater than 1) to see different powers.
A long time ago (the late '80s or early '90s) I needed a square root routine for a floating point number and found an article by a Jack Crenshaw (a real rocket scientist), that always converged in less than 5 iterations. First you normalized your FP number (converted the mantissa to 1.xxxxx) while adjusting the exponent and then multiplied by 1.38 which gave a surprisingly close answer.<p>I'm guessing that (if I could find the article) the constant used in this floating point technique and the one used in this article are somehow inversely related too!<p>In any case, it's a very nice article ... thanks!
There was a nifty article on Flipcode long ago ( <a href="http://www.flipcode.com/archives/Fast_Approximate_Distance_Functions.shtml" rel="nofollow">http://www.flipcode.com/archives/Fast_Approximate_Distance_F...</a> ) in a similar vein.
I remember seeing the article about the history of this pop up and when it started getting into the maths my brain wasn't keen, but this is a really simple and great explanation of even the heavier aspects of the math involved.
i've seen this multiple times in the past few years, each time i've seen it i've understood just a tiny bit more... i would love to see an interview with the original author of this...