TechEcho

10 comments

eloffalmost 10 years ago

The article author talks of the number of CPU ops, as if they were all the same. Since he's using compile time constants, he probably never noticed that division (20-100 cycles) is so horribly slow. The compiler uses a trick like in Hacker's Delight to convert division into multiplication (4 cycles). But shifts and masks are cheap (1 cycle each). This kind of trickery is rarely worth the small space savings (to be fair, the author seems aware of that, by calling it immature optimization.)

评论 #10027597 未加载

评论 #10027632 未加载

uxcnalmost 10 years ago

This is kind of an abuse of c/c++ bitfields. Arguably you don't gain much by using them as opposed to a memory chunk and simple accessors. Bitfields are a way to get an explicit memory layout, but using them subverts word alignments, word sizes, etc...Is there any reason you didn't just pre-process the values and use a huffman coding on them?

eskaalmost 10 years ago

In the context of the overall algorithm, wouldn't you cluster the movie ratings first anyway (O(n)), so most of the algorithm would do computations on those clusters which would have less data than single movies to begin with? I'd worry about minimizing the size of the clusters instead. You'll also probably want some kind of hierarchical data structure to use the cache efficiently. This doesn't help with that. If the movie ratings are sorted, a lot of the data becomes redundant and can be left out entirely. Those values should be all 0-based too, since it saves you some bits (3 bits for 1-5 vs 2 bits for 0-4). With that "optimization" alone you save more bits than with the early bit packing attempt. The solution with primes also doesn't scale well. The bigger the primes get, the more empty space you create.

评论 #10027865 未加载

TheLoneWolflingalmost 10 years ago

Are there any languages that suggest optimizations? Where you can explicitly enable them?Something like this seems like something a compiler could relatively easily do. Have a bunch of different ways of storing structs (word-aligned, size-aligned, bit-aligned, modulo/remainder, modulo-next-prime, modulo-next-easy-prime, etc {what else am I missing here?}), and be able to specify to the compiler via an annotation or similar which you want, or what your typical sizes are and let the compiler choose from there.

评论 #10027756 未加载

hywelalmost 10 years ago

It doesn't actually use bitfields, it describes an alternative to them that was a better choice in this specific instance.Huffman would have been much slower to access, which would have been unacceptable for the use case (iterating 100 million data points multiple times a second).

评论 #10028256 未加载

stuaxoalmost 10 years ago

Awesome, I wondered whether something like this might be possible when I first found out about bitfields, but maths isn't my strong point (and certainly wasn't 20 years ago), so I just forgot about it.

periodontalalmost 10 years ago

The fastest way to encode these dense bitpackings is almost certainly with the Chinese remainder theorem.

评论 #10027820 未加载

hywelalmost 10 years ago

Although obviously there's still a preprocessing step to make that possible.

nickpsecurityalmost 10 years ago

Clever and interesting scheme for encoding the data.

mc_hammeralmost 10 years ago

good read cheers.

10 comments

eloffalmost 10 years ago

评论 #10027597 未加载

评论 #10027632 未加载

uxcnalmost 10 years ago

eskaalmost 10 years ago

评论 #10027865 未加载

TheLoneWolflingalmost 10 years ago

评论 #10027756 未加载

hywelalmost 10 years ago

评论 #10028256 未加载

stuaxoalmost 10 years ago

periodontalalmost 10 years ago

The fastest way to encode these dense bitpackings is almost certainly with the Chinese remainder theorem.

评论 #10027820 未加载

hywelalmost 10 years ago

Although obviously there's still a preprocessing step to make that possible.

nickpsecurityalmost 10 years ago

Clever and interesting scheme for encoding the data.

mc_hammeralmost 10 years ago

good read cheers.

Dense bitpacking

10 comments

Dense bitpacking

10 comments