TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Dense bitpacking

34 pointsby hywelalmost 10 years ago

10 comments

eloffalmost 10 years ago
The article author talks of the number of CPU ops, as if they were all the same. Since he's using compile time constants, he probably never noticed that division (20-100 cycles) is so horribly slow. The compiler uses a trick like in Hacker's Delight to convert division into multiplication (4 cycles). But shifts and masks are cheap (1 cycle each). This kind of trickery is rarely worth the small space savings (to be fair, the author seems aware of that, by calling it immature optimization.)
评论 #10027597 未加载
评论 #10027632 未加载
uxcnalmost 10 years ago
This is kind of an abuse of c&#x2F;c++ bitfields. Arguably you don&#x27;t gain much by using them as opposed to a memory chunk and simple accessors. Bitfields are a way to get an explicit memory layout, but using them subverts word alignments, word sizes, etc...<p>Is there any reason you didn&#x27;t just pre-process the values and use a huffman coding on them?
eskaalmost 10 years ago
In the context of the overall algorithm, wouldn&#x27;t you cluster the movie ratings first anyway (O(n)), so most of the algorithm would do computations on those clusters which would have less data than single movies to begin with? I&#x27;d worry about minimizing the size of the clusters instead. You&#x27;ll also probably want some kind of hierarchical data structure to use the cache efficiently. This doesn&#x27;t help with that. If the movie ratings are sorted, a lot of the data becomes redundant and can be left out entirely. Those values should be all 0-based too, since it saves you some bits (3 bits for 1-5 vs 2 bits for 0-4). With that &quot;optimization&quot; alone you save more bits than with the early bit packing attempt. The solution with primes also doesn&#x27;t scale well. The bigger the primes get, the more empty space you create.
评论 #10027865 未加载
TheLoneWolflingalmost 10 years ago
Are there any languages that <i>suggest</i> optimizations? Where you can explicitly enable them?<p>Something like this seems like something a compiler could relatively easily do. Have a bunch of different ways of storing structs (word-aligned, size-aligned, bit-aligned, modulo&#x2F;remainder, modulo-next-prime, modulo-next-easy-prime, etc {what else am I missing here?}), and be able to specify to the compiler via an annotation or similar which you want, or what your typical sizes are and let the compiler choose from there.
评论 #10027756 未加载
hywelalmost 10 years ago
It doesn&#x27;t actually use bitfields, it describes an alternative to them that was a better choice in this specific instance.<p>Huffman would have been much slower to access, which would have been unacceptable for the use case (iterating 100 million data points multiple times a second).
评论 #10028256 未加载
stuaxoalmost 10 years ago
Awesome, I wondered whether something like this might be possible when I first found out about bitfields, but maths isn&#x27;t my strong point (and certainly wasn&#x27;t 20 years ago), so I just forgot about it.
periodontalalmost 10 years ago
The fastest way to encode these dense bitpackings is almost certainly with the Chinese remainder theorem.
评论 #10027820 未加载
hywelalmost 10 years ago
Although obviously there&#x27;s still a preprocessing step to make that possible.
nickpsecurityalmost 10 years ago
Clever and interesting scheme for encoding the data.
mc_hammeralmost 10 years ago
good read cheers.