Everything I know about the fast inverse square root algorithm

309 点作者 atan212 个月前

19 条评论

If your computer was built after 1999, it probably supports the SSE instruction set. It contains the _mm_rsqrt_ps instruction, which is faster and will give you four reciprocal square roots at once: <a href="https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_rsqrt_ps&ig_expand=5642&ssetechs=SSE" rel="nofollow">https://www.intel.com/content/www/us/en/docs/intrinsics-guid...</a>That being said, the techniques discussed here are not totally irrelevant (yet). There still exists some hardware with fast instructions for float/int conversion, but lacking rsqrt, sqrt, pow, log instructions, which can all be approximated with this nice trick.

评论 #40558934 未加载

评论 #40558055 未加载

评论 #40651985 未加载

评论 #40558830 未加载

pcwalton11 个月前

To nitpick the article a bit:> It's important to note that this algorithm is very much of its time. Back when Quake 3 was released in 1999, computing an inverse square root was a slow, expensive process. The game had to compute hundreds or thousands of them per second in order to solve lighting equations, and other 3D vector calculations that rely on normalization. These days, on modern hardware, not only would a calculation like this not take place on the CPU, even if it did, it would be fast due to much more advanced dedicated floating point hardware.Calculations like this definitely take place on the CPU all the time. It's a common misconception that games and other FLOP-heavy apps want to offload all floating-point operations to the GPU. In fact, it really only makes sense to offload large uniform workloads to the GPU. If you're doing one-off vector normalization--say, as part of the rotation matrix construction needed to make one object face another--then you're going to want to stay on the CPU, because the CPU is faster at that. In fact, the CPU would remain faster at single floating point operations even if you didn't take the GPU transfer time into account--the GPU typically runs at a slower clock rate and relies on parallelism to achieve its high FLOP count.

评论 #40559293 未加载

zzo38computer12 个月前

I wrote a implementation in MMIX:<pre><code> % Constants FISRCON GREG #5FE6EB50C7B537A9 THREHAF GREG #3FF8000000000000 % Save half of the original number OR $2,$0,0 INCH $2,#FFF0 % Bit level hacking SRU $1,$0,1 SUBU $0,FISRCON,$1 % First iteration FMUL $1,$2,$0 FMUL $1,$1,$0 FSUB $1,THREHAF,$1 FMUL $0,$0,$1 % Second iteration FMUL $1,$2,$0 FMUL $1,$1,$0 FSUB $1,THREHAF,$1 FMUL $0,$0,$1 </code></pre> This implementation makes an assumption that the original number is greater than 2^-1021.

rogerallen12 个月前

If you're interested, Wikipedia also has a decent discussion of this function and it's history. <a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root" rel="nofollow">https://en.wikipedia.org/wiki/Fast_inverse_square_root</a>

ncruces12 个月前

I collected a few of these here:<a href="https://github.com/ncruces/fastmath/blob/main/fast.go">https://github.com/ncruces/fastmath/blob/main/fast.go</a>Also see this StackOverflow:<a href="https://stackoverflow.com/questions/32042673/optimized-low-accuracy-approximation-to-rootnx-n" rel="nofollow">https://stackoverflow.com/questions/32042673/optimized-low-a...</a>

评论 #40558348 未加载

评论 #40559778 未加载

评论 #40563104 未加载

g15jv2dp11 个月前

Time for nitpicks, sorry.The formulas for the floats have typos. They should read (-1)^S, not -1^S (which always equals -1).Interpreting the raw bit patterns isn't a piecewise linear approximation of the logarithm. The lines between the data points on the blue graph don't actually exist, it's not possible for a bit to be half set to 1. It's rather a discrete version of the logarithm: the only data points that exist - where the red and blue lines meet - are literally equal to the (scaled, shifted) logarithm.Other than that, nice post!

评论 #40560236 未加载

评论 #40563513 未加载

timerol11 个月前

This is a good post explaining a lot of interesting concepts, with a section that has surprisingly bad algebra.> The exact steps to go from the first form to this one are numerous to say the least, but I've included it in full for completeness.The algebra included after that line has many unnecessary steps, as well as multiple cancelling sign errors. Most notably the second line to the third line does not distribute the negative sign correctly. Picking up after the second line, I would simply have:y_n+1 = y_n + (1 - x * y_n^2) / y_n^2 * (y_n^3 / 2)y_n+1 = y_n + (1 - x * y_n^2) * (y_n / 2)y_n+1 = y_n + y_n / 2 - x * y_n^3/2y_n+1 = 3/2 * y_n - x * y_n^3/2y_n+1 = y_n (3/2 * y_n - x * y_n^2 / 2)y_n+1 = y_n (1.5 * y_n - 0.5 * x * y_n * y_n)This is fewer than 1/3 of the included steps (between the second line and the end), and has the advantage of being correct during the intermediate steps. I don't think any of my steps are anything other than self-evident to someone who understands algebra, but I'd be happy to entertain competing perspectives.

dahart11 个月前

Here’s something new that isn’t mentioned and might be worth adding to the repo. (Edit I’m wrong, it is there, I just didn’t recognize it.) The magic number in the famous code snippet is not the optimal constant. You can do maybe 0.5% better relative error using a different constant. Maybe at the time it was infeasible to search for the absolutely optimal number, but now it’s relatively easy. I also went down this rabbit hole at some point, so I have a Jupyter notebook for finding the optimal magic number for this (1/x^2) and also for (1/x). Anyone wanna know what the optimal magic number is?

评论 #40558424 未加载

评论 #40558428 未加载

omoikane11 个月前

I thought the most interesting bit in this article was the link to "How Java's Floating-Point Hurts Everyone Everywhere":<a href="https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf" rel="nofollow">https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf</a>Written by William Kahan, also known as the Old Man of Floating-Point:<a href="https://news.ycombinator.com/item?id=29042853">https://news.ycombinator.com/item?id=29042853</a> - An Interview with the Old Man of Floating-Point (1998)

评论 #40573876 未加载

schmorptron12 个月前

I really liked this video about it when I saw it a while ago: <a href="https://www.youtube.com/watch?v=p8u_k2LIZyo" rel="nofollow">https://www.youtube.com/watch?v=p8u_k2LIZyo</a>

qingcharles11 个月前

I was building 3D engines a few years before the days of Quake, and having recently watched videos about optimizing the trig code in Super Mario 64, and these other faster algorithms, it makes me wonder how much more can be squeezed out of old hardware.I was optimizing my assembler to the nth degree, but optimizing the algorithm is always going to be the real winner.

atleastoptimal12 个月前

Something interesting is I've seen this mythologized as evidenced of the "cracked engineer" theory, wherein random engineers will accidentally stumble upon incredibly complex discoveries in the course of their day to day work and expect no fanfare for this, besting the scientists and researchers who take the traditional route but aren't spurred by necessity. People, like xkcd, purported this was either Carmack or some random employee who figured it out in a few hours when stuck on a problem then waltzed onto the next one. In truth, as noted in the document, it has a history that goes back decades in academia, this was simply the most notable time it was implemented.I think it speaks to an issue I see in the software engineering world where people assume that collaboration is for low-IQ people and all great innovation comes from some super-genius working on their own for long enough, and isn't required to share how they arrived at their findings. I'm sure the mythology of this algorithm is propelled somewhat by the enigmatic character of writing "what the fuck" next to adding the constant, implying a mystical element to its utility that was arrived at without needing to clarify it in some long boring research paper.

评论 #40558019 未加载

评论 #40557633 未加载

评论 #40557496 未加载

评论 #40557303 未加载

DrNosferatu12 个月前

Note that you’re actually using 3 distinct magic numbers - not only the single 0x5f3759df, but also 0.5 and 1.5;So for further accuracy we can instead have:conv.i = 0x5F1FFFF9 - ( conv.i >> 1 ); conv.f = 0.703952253f ( 2.38924456f - x * conv.f * conv.f ); return conv.f;[from Wikipedia]

p0seidon11 个月前

Thanks for stealing hours of my productive time going down this rabbit hole.

solarized11 个月前

Noob here. Please ELI 5 what this algo used in quake 3 ?.> Solve lighting equationsIs it the light or laser effect when the gun is fired ?

评论 #40558855 未加载

Exuma11 个月前

I tried this in a raytracter today while learning rust and it made the scene render all white. Lol

评论 #40559294 未加载

jxyxfinite12 个月前

Did John not write the code? Or did he copy paste it from somewhere?

评论 #40557011 未加载

评论 #40562482 未加载

评论 #40556924 未加载

layer812 个月前

What’s up with the diagonal lines in the graph backgrounds?

casey212 个月前

He posts c code and says that it computes the inverse square root, but the code he posted is undefined c, so while it could compute the inverse square root of it's input it could also do a long list of other stuff.One of the many reasons you should stay away from real world languages when talking about algorithms unless you are an expert in the language.

评论 #40557184 未加载

评论 #40557227 未加载

评论 #40558851 未加载