The article author talks of the number of CPU ops, as if they were all the same. Since he's using compile time constants, he probably never noticed that division (20-100 cycles) is so horribly slow. The compiler uses a trick like in Hacker's Delight to convert division into multiplication (4 cycles). But shifts and masks are cheap (1 cycle each). This kind of trickery is rarely worth the small space savings (to be fair, the author seems aware of that, by calling it immature optimization.)
This is kind of an abuse of c/c++ bitfields. Arguably you don't gain much by using them as opposed to a memory chunk and simple accessors. Bitfields are a way to get an explicit memory layout, but using them subverts word alignments, word sizes, etc...<p>Is there any reason you didn't just pre-process the values and use a huffman coding on them?
In the context of the overall algorithm, wouldn't you cluster the movie ratings first anyway (O(n)), so most of the algorithm would do computations on those clusters which would have less data than single movies to begin with? I'd worry about minimizing the size of the clusters instead.
You'll also probably want some kind of hierarchical data structure to use the cache efficiently. This doesn't help with that.
If the movie ratings are sorted, a lot of the data becomes redundant and can be left out entirely.
Those values should be all 0-based too, since it saves you some bits (3 bits for 1-5 vs 2 bits for 0-4). With that "optimization" alone you save more bits than with the early bit packing attempt.
The solution with primes also doesn't scale well. The bigger the primes get, the more empty space you create.
Are there any languages that <i>suggest</i> optimizations? Where you can explicitly enable them?<p>Something like this seems like something a compiler could relatively easily do. Have a bunch of different ways of storing structs (word-aligned, size-aligned, bit-aligned, modulo/remainder, modulo-next-prime, modulo-next-easy-prime, etc {what else am I missing here?}), and be able to specify to the compiler via an annotation or similar which you want, or what your typical sizes are and let the compiler choose from there.
It doesn't actually use bitfields, it describes an alternative to them that was a better choice in this specific instance.<p>Huffman would have been much slower to access, which would have been unacceptable for the use case (iterating 100 million data points multiple times a second).
Awesome, I wondered whether something like this might be possible when I first found out about bitfields, but maths isn't my strong point (and certainly wasn't 20 years ago), so I just forgot about it.