I love reading about how companies scale their BigHuge data but it bothers me that we still haven't reached the point where scalability is a commodity instead of a patchwork of technology that everyone actor solves in their own way.
"Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value."<p>I hope they introduced some NoSQL sweetness by now.
Their most recent infrastructure blog post was from January 2012, and shows that they're using Postgres 9, Cassandra 0.8, and local disk only (no more EBS). I'm curious if the recently-announced provisioned IOPS would enable them to go back to EBS.<p><a href="http://blog.reddit.com/2012/01/january-2012-state-of-servers.html" rel="nofollow">http://blog.reddit.com/2012/01/january-2012-state-of-servers...</a>
Only disagreement (although I feel like I'm arguing w/ Linus about git) is don't memcache session data (lesson 5.) Memcache's 1mb max-block (exceeding that removes too many performance perks to be considered viable) introduces a "I need to constantly worry about my sessions getting too big" mental overhead that isn't worth it.<p>Go with Redis for storing session data.