> conspiracy theories around popcount and the NSA<p>Whoa. Some links (with sometimes not-very-well-thought-out allegations):<p><a href="https://groups.google.com/forum/#!topic/comp.arch/UXEi7G6WHuU" rel="nofollow">https://groups.google.com/forum/#!topic/comp.arch/UXEi7G6WHu...</a><p><a href="https://www.schneier.com/blog/archives/2014/05/the_nsa_is_not_.html" rel="nofollow">https://www.schneier.com/blog/archives/2014/05/the_nsa_is_no...</a>
I like that it compiles to a single instruction, but lots of languages already have this; it's pretty common.<p>Even boring old Java has had Integer.bitCount for many years.
I did some experimentation with popcnt in C++ six years ago and was impressed to find that the intrinsic in gcc was faster than the best inline assembly I could come up with using the popcnt instruction:<p><a href="https://github.com/bobbyi/Fast-Bit-Counting" rel="nofollow">https://github.com/bobbyi/Fast-Bit-Counting</a>
Now that most CPUs have population count, it may as well be exposed at the language level.<p>Doing it by table lookup results in questions such as "am I wasting too much cache space on this?" and "is a 64K table causing cache misses".
Exactly what purpose does the surrounding instructions serve in this and similar simple cases? Is it compiler dogma or a missed uncommon optimization?<p><pre><code> push rbp
mov rbp, rsp
popcnt eax, edi
pop rbp
ret</code></pre>
It's exposed as an intrinsic in llvm. C(99? I think) surfaces these in math.h, although the standard is silent on some edge cases (all zeros), and numbers are autopromoted to 32 bit integer. Julia surfaces these as well.<p>In case you're wondering what this could be useful for besides super secret NSA stuff, and Bitcoin mining, here are a few suggestions:<p>1) hyperloglog. (Similar to Bitcoin). Keep an estimated count of items streaming by by hashing them and store the highest lzcount of the hashes for each category you're tracking. This will be ~ log2(category count)<p>2) converting from fixed point to floating point. The number of zeros in front of your value represents the exponent of your value (or ones in the case of a 2's complement negative fixed point), which is critical to deducting the float representation.<p>Along those lines, one of the things I've done is implemented floating point-like datatypes, which extensively uses lzcount and locount for tracking values and also will use tzcount to measure if the values are exact or not.<p><a href="https://github.com/Etaphase/FastSigmoids.jl/blob/master/README.md" rel="nofollow">https://github.com/Etaphase/FastSigmoids.jl/blob/master/READ...</a>
I've been using bit population count to traverse packed sparse arrays since the CDC 6600, so it's handy for more things than Hamming distance. Always nice to have in hardware, but pretty cheap to synthesize when it isn't.
Reading <a href="http://0x80.pl/articles/sse-popcount.html" rel="nofollow">http://0x80.pl/articles/sse-popcount.html</a>, I would think the hardware instruction is slower than the best software implementation. Or has that changed on newer hardware?
This submission originally linked to a blog post whose author (not the HN submitter) asked us to delete it. We don't want to kill the thread, so we removed the URL above.
> Whereas GCC’s __builtin_popcount() only works on unsigned ints Rust’s count_ones() works on signed, too!<p>C's weak typing should handle signed
<i>I was once asked to come up with a table lookup method for popcount on the spot and could not come up with a solution.</i><p>Oh, Hacker News.<p>If someone can't solve a problem like this off the top of their head, does it not act as a strong signal that they are a beginner and you should probably look elsewhere for quality information?