The big constants used in the formula<p><pre><code> void to_bin(unsigned char c, char *out) {
*(unsigned long long*)out = 3472328296227680304ULL +
(((c * 9241421688590303745ULL) / 128) & 72340172838076673ULL);
}
</code></pre>
are much more obvious when they remain in hex:<p>0x3030303030303030 -- the byte array of '0's encoded as ASCII<p>0x8040201008040201 -- the magic<p>0x101010101010101 -- only 0 or 1 to be added to the appropriate array element<p>Therefore, more readable:<p><pre><code> void to_bin(unsigned char c, char *out) {
/* endian dependent, works on x86 */
*(unsigned long long*)out = 0x3030303030303030ULL +
(((c * 0x8040201008040201ULL) >> 7) & 0x0101010101010101ULL);
}
</code></pre>
The function is of course endian dependent (I'd add a comment about that fact in it) and the explanation of the quoted function on the page starts at "If you shift the the constant by 7 bits to the right." The page is a result of an edit where the older function (that ends with + c / 128;) was explained first.<p>Still, nice.
Awesome. I'm always amazed by bitmagic like this. Today I saw one on stackoverflow and spent a good deal of time failing to understand it:<p><a href="https://stackoverflow.com/questions/33532045/how-to-swap-first-2-consecutive-different-bits" rel="nofollow">https://stackoverflow.com/questions/33532045/how-to-swap-fir...</a><p>And of course I'm reminded of the fast inverse square root:<p><a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root" rel="nofollow">https://en.wikipedia.org/wiki/Fast_inverse_square_root</a>
Why isn't this faster than the "standard" loop based version? The disclaimer at the start says it's not, but I would have thought that 4 64bit math operations would be much faster than a loop with 8 steps and comparisons and so on.
This is an example of SIMD Within A Register. There are some more neat examples here: <a href="http://aggregate.org/MAGIC/" rel="nofollow">http://aggregate.org/MAGIC/</a>
Here's a fancier, slightly less-magic, version. How to compile is left as an exercise for the reader (hint: Haswell or newer, requires BMI2 instruction set):<p><pre><code> void toBinary(uint8_t byte) {
uint64_t zeroes = *(uint64_t *)"00000000";
uint64_t resultStr =
zeroes | __builtin_bswap64(_pdep_u64(byte, 0x0101010101010101ULL));
printf("hello: %.*s\n", 8, (char *)&resultStr);
}</code></pre>
> x86 is little endian - that's why it's backwards<p>This can be read the wrong way, although it is correct: the author is referring to the final byte sequence and not the input bit sequence. Endianness applies to bytes, not bits.
Cool but I want to say I've seen similar, google turns up:
<a href="http://www.asmcommunity.net/forums/topic/?id=28498" rel="nofollow">http://www.asmcommunity.net/forums/topic/?id=28498</a>
I recommend reading over this: <a href="http://0x80.pl/articles/convert-to-bin.html" rel="nofollow">http://0x80.pl/articles/convert-to-bin.html</a><p>btw all this SWAR stuff should be aligned before applying.
Be careful with alignment. On architectures where it matters, the caller may pass in a "char* out" which is not suitably aligned, and doing an "unsigned long long" store into that address may fault.