The the code the article links to for zipmap.c (<a href="https://github.com/antirez/redis/blob/unstable/src/zipmap.c" rel="nofollow">https://github.com/antirez/redis/blob/unstable/src/zipmap.c</a>) is <i>rather</i> literate.<p>I haven't dug extremely deeply into the sources for many F/OSS projects; the code I'm interested in reading has been inevitably opaque (at least to my inexperience). This particular source file (and maybe the rest of Redis?) is really good. I think I'll be taking many more looks at Redis (code- and usage-wise) in the future.
1 million pair using 16 MB is about 16 bytes per pair, which is perfectly fine but nothing impressive.<p>The dataset is static, so a simple naive solution would be to create a big array sorted by key. Assuming both photo and user IDs use 4 bytes each, this would result in about 2GB of data. Then use binary search to lookup values.<p>However, if we really want to reduce the size, we could build a finite state machine from the dataset (maybe reverse the values to increase the level of shared suffixes) which should reduce the size by an order of magnitude.
<i>Best of all, lookups in hashes are still O(1), making them very quick.</i><p>How quick is "very quick"? I was hoping to see some performance benchmarks, not just memory usage benchmarks.
First, I love Redis :-)<p>Second, this functionality seems to be a stop gap to support old users who may be using old clients. So they need an array of 300 million elements each containing an integer of 12 million or less. Assuming 32 bit values (24 would work but.. efficiency), that's a 1,144MB array which, in theory, could be stored as a file and cached through the filesystem cache.<p>I wonder how the performance of that would stack up against Redis. The convenience of Redis being a network daemon out of the box is potentially the big win here, though the memory usage even in the optimized case seems to be around 4x given that it's doing more than the task necessarily requires (from my interpretation of it - I could be wrong!)
You can find similar examples -- ruby code for the "exact" implementation as described by Instagram -- of this and other redis, memory optimization(s) at the Redis site: <a href="http://redis.io/topics/memory-optimization" rel="nofollow">http://redis.io/topics/memory-optimization</a>
Lookups are not really O(1), they're O(number of keys per hash) as long as the hashes are zipmaps. When they become full blown hashes the memory usage increases.<p>Still, this is a very good way to store a lot of key/value pairs in Redis.
Why use clear text numbers? Most of the time, you're going to be using large numbers, so binary pack them as save more space.<p>i had the same issue, normal storage was 1.1gb of space, HSET down to 200mb and binary packing every integer down dbl() bought it right down to 163mb of memory (32bit instance). For that 163mb, I was slicing a md5 of the field for the hset prefix, packing that and then using the remainer as the hset suffix. (due to the data format of the input field)
The hash data type is probably the most awesome and underused feature of redis. Here is a small gem I wrote to expose it a little better in ruby: <a href="https://github.com/lyconic/redis-native_hash" rel="nofollow">https://github.com/lyconic/redis-native_hash</a>