For interest rather than practical use:<p>Creating a 2GiB lookup table is ~10% faster on my machine than method #10. That is, faster on the benchmark in the blog post. A 2GiB lookup table is horrible, has a little setup cost dominated by calloc-ing the array, and would trash caches with real code.<p>A also made a solution using a union to split x into two shorts and a 64kiB lookup table that was ~15% slower. For more expensive functions, lookup tables are an annoying baseline to beat. (Although still often good to avoid because of dealing with setup, cache problems, etc.)
It seems bizarre to call the first group of methods "decimal based" when the methods don't ever look at the decimal digits of the number being tested. I would call them "arithmetic" or something like that.
The benchmarks are crap. The loop/function call overhead time dominates. They should list the benchmark for when the implementation is "return x;" That gives a baseline number. On my machine I have to run a few thousand runs before I can distinguish between "return x;" and "return ((x != 0) && !(x & (x - 1)));"<p>They both take about 10s for 2<i></i>32 iterations.
Recent Intel/AMD CPUs have a POPCNT instruction, which seems like the logical way to do this. Would be interested to see how that performs compared to these implementations.
A <i>nasty</i> solution, which assumes and abuses IEEE floating point format and demonstrates a couple of things.<p><pre><code> int isPowerOfTwo (unsigned int x)
{
int exponent;
union { unsigned int u; float f; } tmp;
tmp.f = x;
exponent = (tmp.u >> 23) - 127;
return x == (1 << exponent);
}
</code></pre>
One can also cast to a double without losing precision, mask out the exponent and then compare to 1.0. That solution is even nastier, and needs #define's to deal with endianness.<p>Obviously the above solution is not a good idea! Amongst the several problems, casting to floats is <i>really</i> slow. Sometimes I store my integers in doubles throughout my code because it saves conversions, and can be more convenient. (Matlab users routinely store integers as doubles.)<p>What I found interesting/disconcerting was that the above function doesn't compile reliably. When using 'gcc -Wall' I get isPowerOfTwo(0)==0, whereas with 'gcc -Wall -O2' I get isPowerOfTwo(0)==1. clang has the same change in behaviour with optimization levels.
The decrement test for a power of two can be modified to count bits and runs in log n time.<p><pre><code> int bits(unsigned n)
{
int i = 0;
while (n > 0) { n &= n-1; i++ }
return i;
}</code></pre>
I haven't tried, but I would expect doing the linear search in the reverse direction will be faster than binary search. It, on average, inspects just two values; binary search does about five.
Here's another. It's strictly inferior to the x&(x-1) one, but the idea that makes it work is so pretty it seems worth mentioning.<p><pre><code> y = x | (x>>1);
y |= y>>2;
y |= y>>4;
y |= y>>8;
y |= y>>16;
y |= y>>32; // for 64-bit numbers;
return y == (x<<1)-1;
</code></pre>
So what's going on here? Well, it's easy enough to locate the <i>lowest</i> set bit in x -- it's x & -x, the same basic idea as the x&(x-1) trick -- but what the code above does is to locate the <i>highest</i> by "filling" rightwards from that bit. After the first line, the 1-bits in y are the 1-bits in x, and the bits one to the right of them. After the second, it's the 1-bits in x shifted right by 0..3 places. After the next, 0..7 places. And so on. Eventually, that comes to "all the 1-bits in x, and everything any distance to their right". So, e.g., 000101000100 -> 000111100110 --> 000111111111 which then doesn't change.<p>The nice thing is that the number of steps this takes goes like log(word width). It's in the same spirit as the 0x33333333 popcount trick.<p>(On some processors -- all x86 ones since 386, for instance -- there's an instruction that does something similar directly, but on at least some x86 processors it's rather slow.)
Is there a reason this won't work? It's the most 'readable' way I could come up with.<p><pre><code> #(python code)
def is_power_of_two(n):
import math
if n <= 0:
return False
power = round(math.log(n, 2))
return 2 ** power == n</code></pre>
I think it would be better to list the worst- and average-case asymptotic analyses of each method, rather than just the run-times. For instance, the "Decimal-Based Approaches to Checking for Powers of Two" will take longer depending on the size of your number.