Bit Twiddling Hacks

143 pointsby Cieplakover 4 years ago

15 comments

sjrdover 4 years ago

This kind of hacks is a major part of how we made Scala.js' 64-bit integers fast (the fastest known implementation of 64-bit integers on JavaScript). See [1] for the implementation, and [2] at section 4.4 for the long explanation with benchmarks and the other tricks we use.[1] <a href="https://github.com/scala-js/scala-js/blob/v1.3.1/linker-private-library/src/main/scala/org/scalajs/linker/runtime/RuntimeLong.scala" rel="nofollow">https://github.com/scala-js/scala-js/blob/v1.3.1/linker-priv...</a>[2] <a href="https://lampwww.epfl.ch/~doeraene/thesis/doeraene-thesis-2018-cross-platform-language-design.pdf" rel="nofollow">https://lampwww.epfl.ch/~doeraene/thesis/doeraene-thesis-201...</a>

jsheardover 4 years ago

My favorite thing about these is how they come full circle - programmers find convoluted ways to outsmart the compiler in some primitive operation, but then CPU manufacturers add dedicated instructions for those operations, and eventually the compilers outsmart the programmer by deleting their smartass method and emitting a single instruction instead<a href="https://godbolt.org/z/nPdrcz" rel="nofollow">https://godbolt.org/z/nPdrcz</a>

评论 #25301765 未加载

klmrover 4 years ago

At my last job interview I had to implement popcount and explain how to optimise it (amongst other things).I was able to jot down a naïve implementation and the obvious optimisation based on a lookup tables. I only vaguely remembered the Bit Twiddling treatment of the subject but, with a bit of nudging from the interviewer, I managed to implement and explain the variant that runs in O(set bits) (“Brian Kernighan's way”). I got the job.Now, it’s fashionable to deride this this kind of code interview as unrealistic and unhelpful. But in my first week on the job, by sheer coincidence, I had to use the function. Obviously there are existing, efficient implementations, including intrinsics. But knowing how to derive an efficient implementation certainly didn’t harm. My job has since evolved into different responsibilities but low-level algorithmic knowledge is still important. I’m not sure testing for it in job interviews is generally a good idea, and designing good job interviews is certainly a big topic. But in my particular case it happened to be a relevant, fair test of my abilities.

评论 #25302513 未加载

评论 #25302243 未加载

lou1306over 4 years ago

These are typically seen as tricks for embedded programming. But they have (at least) another useful application.Years ago I was messing around with SAT-based bounded model checking of C programs that I was crafting. Many of these tricks allow you to remove branches/loops from your program, which is great because you end up with a smaller SAT formula, which (usually) can be solved faster.

ncmncmover 4 years ago

This list is super-great, except: all modern targets[0] (and many non-modern, including Alpha, Cray, CDC, Stretch) have the POPCOUNT instruction, which does a key operation underlying a large portion of the target operations much faster than presented. It provides an order-of-magnitude speedup in numerous algorithms. The list should have a comprehensive treatment.At the moment, the only ISO Standard way to say "popcount" is in C++, with e.g.<pre><code> std::bitset<64>(x).count() </code></pre> But even this is insufficient on MSVC, which targets pre-2002 arm64, which lacked it; and on gcc and clang on amd64, similarly, absent a -fpopcnt or -march=native or related option (which there are numerous other reasons to use).Given -march=native or =core2 or various other means, optimizers will happily rewrite the Kernighan loop into a straight-up POPCNT instruction. Anyway, all compilers provide it as a non-standard, therefore variously-spelled, intrinsic, but (except on MSVC) only actually produce it if the "-march=" or related commad-line option enables that.Historically, it is common for instruction sets to start out lacking the instruction, and then getting it in subsequent releases because of customer demand. Also historically, a key such customer has frequently been the US NSA. Thus, initial and64, alpha, POWER, and SPARC lacked it, but soon got it. x86 ("ia32") is perhaps the principal laggard. When a feature is added, at great expense, to a subsequent ISA revision, that says a lot for its importance.[0] All except RISC-V, to date. Some would say this makes RISC-V non-modern. It appears in the (still) unratified B extension, which (therefore) nobody implements.

stfwnover 4 years ago

I once had to count the number of set bits (the Hamming weight) for an assignment. It had to be done in under 40 bitwise ops, without loops and with constants of up to 8 bits. The naive approach took way too many ops! I was stumped until I found the linked page, in particular 'Counting bits set, in parallel'[1]. Parallelism at the bit level?! That is when I realized low level programming was completely badass.[1]: <a href="https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel" rel="nofollow">https://graphics.stanford.edu/~seander/bithacks.html#CountBi...</a>

评论 #25301782 未加载

gorjusborgover 4 years ago

If you're into this sort of thing and like print books, "Hacker's Delight" by Henry S. Warren, Jr. is great.

评论 #25305873 未加载

ape4over 4 years ago

I recalled seeing a nicer sign() or sgn() function. Here it is:<pre><code> template <typename T> int sgn(T val) { return (T(0) < val) - (val < T(0)); } </code></pre> Not mine, from here <a href="https://stackoverflow.com/questions/1903954/is-there-a-standard-sign-function-signum-sgn-in-c-c" rel="nofollow">https://stackoverflow.com/questions/1903954/is-there-a-stand...</a>

jribover 4 years ago

These slides on Bit Hacks are interesting too if you're looking for a more verbose presentation on some of these:<a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2018/lecture-slides/MIT6_172F18_lec3.pdf" rel="nofollow">https://ocw.mit.edu/courses/electrical-engineering-and-compu...</a>Also, curiously enough, I was on the page in the submission after a co-worker asked yesterday if there was a cleaner way to do:<pre><code> return direction === 'asc' ? value : -value; </code></pre> Was fun to work through how <a href="https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate" rel="nofollow">https://graphics.stanford.edu/~seander/bithacks.html#Conditi...</a> works. (no, we didn't change our javascript code to use the bit hack)

rectangover 4 years ago

Having written my fair share of bit-twiddling hacks in low-level code, my takeaway is that because the hack code doesn't resemble what it actually does, it's all too easy to create bugs. Unit testing the living daylights out of any bit-twiddling code path is a must.Also helpful: being able to represent integer types with binary. Since I'm not a genius and I write code to be maintained by other not-geniuses, `mask = 0b_1010_1010` is clearer than `mask = 0xAA`.

flohofwoeover 4 years ago

Another simple bit-twiddling-hack (it has a proper name which I unfortunately forgot) is SIMD without SIMD instructions by masking out the top-level bits (which are essentially the carry-flags), e.g. doing eight 7-bit additions at once:<pre><code> uint64_t a = ((a0 & 0x7F) | ((a1 & 0x7F)<<8) ...; uint64_t b = ((b0 & 0x7F) | ((b1 & 0x7F)<<8) ...; uint64_t c = (a + b) & 0x7F7F7F7F7F7F7F7F;</code></pre>

评论 #25301039 未加载

bitcharmerover 4 years ago

We use bit twiddling a lot to reduce conditional jumps. This website is a great starting point if you need to learn it. Highly recommend.

oytisover 4 years ago

A trick I used to get the mask of least significant set bit in a bit mask: `flag = bitmask & (-bitmask)`. Works with two's complement numbers only of course. (and modern architectures have a dedicated command for this)

mschuetzover 4 years ago

The interleaving/morton code bit hacks are great. They result in serious performance improvements if you need to compute morton codes.

almogover 4 years ago

TIL these two lines (to get an integer absolute value without branching) are _patented_:<pre><code> int const mask = v >> sizeof(int) * CHAR_BIT - 1; r = (v ^ mask) - mask;</code></pre>