Tumblr made a blog post[1] about exploiting similar compression techniques.<p>Using that technique, personally I've had great success storing data in hashes by JSON-encoding it beforehand, where it would normally be like so:<p><pre><code> HSET user:159784623 username ihsw
HSET user:159784623 email blahblah@gmail.com
HSET user:159784623 created_at 1377986411
</code></pre>
But instead it's like so:<p><pre><code> HSET user:156039 687 {"username":"ihsw","email":"blahblah@gmail.com","created_at": 1377986411}
</code></pre>
Where we divide data into "buckets" that are 1024 in size and given an ID of 159784623, the resulting bucket ID is 156039 and the remainder is 687.<p><pre><code> id = 159784623
bucket_size = 1024
remainder = id % bucket_id
bucket_id = (id - remainder) / bucket_size
</code></pre>
Using this I've been able to reduce memory usage anywhere from 40%-80% (yes, 80%), which depends compressibility of the data (length and randomness of each hash item).<p>I've also been replacing dictionary keys with integers and it further reduces the size of the data being stored by an additional ~30%.<p><pre><code> HSET user:156039 687 {"0":"ihsw","1":"blahblah@gmail.com","2":"1377986411}
</code></pre>
It shouldn't be underestimated how these simple techniques can make such a significant impact, especially when the gains are quite considerable. JSON-encoded data may be quite verbose, so CSV may add considerable gains too, but JSON can accommodate missing keys.<p>Lists and sets can also accommodate "bucketing" of data, however that comes with added complexity of accommodating the variety of redis commands that come with those data structures (BLPOP, SADD, SDIFF, etc).<p>[1] <a href="http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value-pairs" rel="nofollow">http://instagram-engineering.tumblr.com/post/12202313862/sto...</a>