Curiously enough, today, such a hack is completely pointless (at least on x86): the <i>rsqrtps</i> instruction can do 4 of these in just 3 clock cycles, with higher accuracy to boot.<p>Most modern instruction sets with remotely decent floating point support have a similar instruction, in large part <i>because</i> of the prevalence of hacks like this in the past.