TechEcho

13 comments

acqqover 9 years ago

The big constants used in the formula<pre><code> void to_bin(unsigned char c, char *out) { *(unsigned long long*)out = 3472328296227680304ULL + (((c * 9241421688590303745ULL) / 128) & 72340172838076673ULL); } </code></pre> are much more obvious when they remain in hex:0x3030303030303030 -- the byte array of '0's encoded as ASCII0x8040201008040201 -- the magic0x101010101010101 -- only 0 or 1 to be added to the appropriate array elementTherefore, more readable:<pre><code> void to_bin(unsigned char c, char *out) { /* endian dependent, works on x86 */ *(unsigned long long*)out = 0x3030303030303030ULL + (((c * 0x8040201008040201ULL) >> 7) & 0x0101010101010101ULL); } </code></pre> The function is of course endian dependent (I'd add a comment about that fact in it) and the explanation of the quoted function on the page starts at "If you shift the the constant by 7 bits to the right." The page is a result of an edit where the older function (that ends with + c / 128;) was explained first.Still, nice.

评论 #10515069 未加载

评论 #10513223 未加载

评论 #10517332 未加载

评论 #10513244 未加载

personjerryover 9 years ago

Awesome. I'm always amazed by bitmagic like this. Today I saw one on stackoverflow and spent a good deal of time failing to understand it:<a href="https://stackoverflow.com/questions/33532045/how-to-swap-first-2-consecutive-different-bits" rel="nofollow">https://stackoverflow.com/questions/33532045/how-to-swap-fir...</a>And of course I'm reminded of the fast inverse square root:<a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root" rel="nofollow">https://en.wikipedia.org/wiki/Fast_inverse_square_root</a>

nialoover 9 years ago

Why isn't this faster than the "standard" loop based version? The disclaimer at the start says it's not, but I would have thought that 4 64bit math operations would be much faster than a loop with 8 steps and comparisons and so on.

评论 #10513767 未加载

评论 #10513638 未加载

rjmunroover 9 years ago

Rather than add 0x3030303030303030, you could OR with it because you know there will never be a carry. This may save cycles on some architectures.

anewhnaccount2over 9 years ago

This is an example of SIMD Within A Register. There are some more neat examples here: <a href="http://aggregate.org/MAGIC/" rel="nofollow">http://aggregate.org/MAGIC/</a>

estover 9 years ago

For Python<pre><code> >>> f1=lambda c: __import__('struct').pack('<Q', 3472328296227680304 + (((c * 9241421688590303745) / 128) & 72340172838076673)) >>> f1(1) '00000001' >>> f1(2) '00000010' >>> f1(5) '00000101' >>> f2=lambda c: struct.pack('<Q', (((c * 0x8040201008040201) / 128) & 0x0101010101010101)) >>> f2(1) '\x00\x00\x00\x00\x00\x00\x00\x01' >>> f2(2) '\x00\x00\x00\x00\x00\x00\x01\x00' >>> f2(101) '\x00\x01\x01\x00\x00\x01\x00\x01' </code></pre> Use "<" for little-endian for x86

seijiover 9 years ago

Here's a fancier, slightly less-magic, version. How to compile is left as an exercise for the reader (hint: Haswell or newer, requires BMI2 instruction set):<pre><code> void toBinary(uint8_t byte) { uint64_t zeroes = *(uint64_t *)"00000000"; uint64_t resultStr = zeroes | __builtin_bswap64(_pdep_u64(byte, 0x0101010101010101ULL)); printf("hello: %.*s\n", 8, (char *)&resultStr); }</code></pre>

评论 #10513683 未加载

zamalekover 9 years ago

> x86 is little endian - that's why it's backwardsThis can be read the wrong way, although it is correct: the author is referring to the final byte sequence and not the input bit sequence. Endianness applies to bytes, not bits.

eruover 9 years ago

Nice little problem. Thanks for sharing!

rlonsteinover 9 years ago

Cool but I want to say I've seen similar, google turns up: <a href="http://www.asmcommunity.net/forums/topic/?id=28498" rel="nofollow">http://www.asmcommunity.net/forums/topic/?id=28498</a>

nwmcsweenover 9 years ago

I recommend reading over this: <a href="http://0x80.pl/articles/convert-to-bin.html" rel="nofollow">http://0x80.pl/articles/convert-to-bin.html</a>btw all this SWAR stuff should be aligned before applying.

jjnoakesover 9 years ago

Be careful with alignment. On architectures where it matters, the caller may pass in a "char* out" which is not suitably aligned, and doing an "unsigned long long" store into that address may fault.

ameliusover 9 years ago

Feature request: add a parameter that specifies the base into which the function rewrites the number.

13 comments

acqqover 9 years ago

评论 #10515069 未加载

评论 #10513223 未加载

评论 #10517332 未加载

评论 #10513244 未加载

personjerryover 9 years ago

nialoover 9 years ago

评论 #10513767 未加载

评论 #10513638 未加载

rjmunroover 9 years ago

Rather than add 0x3030303030303030, you could OR with it because you know there will never be a carry. This may save cycles on some architectures.

anewhnaccount2over 9 years ago

This is an example of SIMD Within A Register. There are some more neat examples here: <a href="http://aggregate.org/MAGIC/" rel="nofollow">http://aggregate.org/MAGIC/</a>

estover 9 years ago

seijiover 9 years ago

评论 #10513683 未加载

zamalekover 9 years ago

eruover 9 years ago

Nice little problem. Thanks for sharing!

rlonsteinover 9 years ago

Cool but I want to say I've seen similar, google turns up: <a href="http://www.asmcommunity.net/forums/topic/?id=28498" rel="nofollow">http://www.asmcommunity.net/forums/topic/?id=28498</a>

nwmcsweenover 9 years ago

jjnoakesover 9 years ago

Be careful with alignment. On architectures where it matters, the caller may pass in a "char* out" which is not suitably aligned, and doing an "unsigned long long" store into that address may fault.

ameliusover 9 years ago

Feature request: add a parameter that specifies the base into which the function rewrites the number.

8-bit number to binary string

13 comments

8-bit number to binary string

13 comments