TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Storing hundreds of millions of simple key-value pairs in Redis

153 pointsby mikeykover 13 years ago

10 comments

moxiemk1over 13 years ago
The the code the article links to for zipmap.c (<a href="https://github.com/antirez/redis/blob/unstable/src/zipmap.c" rel="nofollow">https://github.com/antirez/redis/blob/unstable/src/zipmap.c</a>) is <i>rather</i> literate.<p>I haven't dug extremely deeply into the sources for many F/OSS projects; the code I'm interested in reading has been inevitably opaque (at least to my inexperience). This particular source file (and maybe the rest of Redis?) is really good. I think I'll be taking many more looks at Redis (code- and usage-wise) in the future.
评论 #3183503 未加载
gorsetover 13 years ago
1 million pair using 16 MB is about 16 bytes per pair, which is perfectly fine but nothing impressive.<p>The dataset is static, so a simple naive solution would be to create a big array sorted by key. Assuming both photo and user IDs use 4 bytes each, this would result in about 2GB of data. Then use binary search to lookup values.<p>However, if we really want to reduce the size, we could build a finite state machine from the dataset (maybe reverse the values to increase the level of shared suffixes) which should reduce the size by an order of magnitude.
评论 #3183832 未加载
评论 #3183977 未加载
cpercivaover 13 years ago
<i>Best of all, lookups in hashes are still O(1), making them very quick.</i><p>How quick is "very quick"? I was hoping to see some performance benchmarks, not just memory usage benchmarks.
评论 #3184062 未加载
评论 #3183657 未加载
petercooperover 13 years ago
First, I love Redis :-)<p>Second, this functionality seems to be a stop gap to support old users who may be using old clients. So they need an array of 300 million elements each containing an integer of 12 million or less. Assuming 32 bit values (24 would work but.. efficiency), that's a 1,144MB array which, in theory, could be stored as a file and cached through the filesystem cache.<p>I wonder how the performance of that would stack up against Redis. The convenience of Redis being a network daemon out of the box is potentially the big win here, though the memory usage even in the optimized case seems to be around 4x given that it's doing more than the task necessarily requires (from my interpretation of it - I could be wrong!)
mythzover 13 years ago
I love companies like Instagram who take the time to share their scaling issues and experiences.
thezilchover 13 years ago
You can find similar examples -- ruby code for the "exact" implementation as described by Instagram -- of this and other redis, memory optimization(s) at the Redis site: <a href="http://redis.io/topics/memory-optimization" rel="nofollow">http://redis.io/topics/memory-optimization</a>
catwellover 13 years ago
Lookups are not really O(1), they're O(number of keys per hash) as long as the hashes are zipmaps. When they become full blown hashes the memory usage increases.<p>Still, this is a very good way to store a lot of key/value pairs in Redis.
评论 #3184339 未加载
pedigreeover 13 years ago
Why use clear text numbers? Most of the time, you're going to be using large numbers, so binary pack them as save more space.<p>i had the same issue, normal storage was 1.1gb of space, HSET down to 200mb and binary packing every integer down dbl() bought it right down to 163mb of memory (32bit instance). For that 163mb, I was slicing a md5 of the field for the hset prefix, packing that and then using the remainer as the hset suffix. (due to the data format of the input field)
评论 #3183959 未加载
评论 #3195633 未加载
评论 #3185680 未加载
zellynover 13 years ago
Why use the whole Media ID as the key within the bucket, rather than just the last three digits?
评论 #3196453 未加载
examancerover 13 years ago
The hash data type is probably the most awesome and underused feature of redis. Here is a small gem I wrote to expose it a little better in ruby: <a href="https://github.com/lyconic/redis-native_hash" rel="nofollow">https://github.com/lyconic/redis-native_hash</a>