Having no experience with database/website administration myself, I'm struck by just how <i>little</i> I'm able to translate the works and concepts in this post into actual, manual labor.<p>For each and every thing that Jason talked about...upgrading Cassandra, moving off EBS, embarking on self-heal and auto-scale projects...what took the reader a few seconds to read and cognise undoubtedly represented hours and hours of work on the part of the Reddit admins.<p>I guess it's just the nature of the human mind. I don't think I could ever fully appreciate the amount of work that goes into <i>any</i> project unless I've been through it myself (and even then, the brain is awesome at minimizing the memory of pain). So Reddit admins, if you're reading this, while I certainly can't fully appreciate the amount of labor and life-force you've dedicated to the site, I honestly do appreciate it, and I wish you guys nothing but success in the future!
It's interesting to see that they're sticking with Cassandra, and that they're having a much better experience with 0.8. I've been hearing so many fellow coders in SF hate on Cassandra that I had stopped considering it for projects. Has anybody worked with 0.8 or 1.0? Would you recommend Cassandra?<p>I got to work with Riak a lot while I was at DotCloud, but the speed issue was pretty frustrating (it can be painfully slow).
It reminds me of Slashdot circa 1998/99, back when we watched those guys grow their then-new-found popularity out of a dorm-room Linux box; at a time when the web was a mere fraction of the size it is today.<p>Godspeed, reddit. You're on the right track.
They say they moved off ebs and onto local storage for postgres and saw a big increase in reliability and performance.<p>I did the same for my site last year and it was great.<p>This is one of the reasons why I haven't moved my Postgres databases to enterprisedb or heroku: they use ebs.
I'm unfamiliar with hosting costs or really any costs running a site as popular as reddit. Anyone with experience in this area have a ballpark figure for how much it would cost per month to run this sort of setup?
Wondering how much of that 2TB dataset is necessary for the common daily functionality of reddit, probably less than 1%, and the rest is historical data, accessed by almost no one, except perhaps by the submission-dupe- checking algorithms, and similar?
Those are staggering numbers, glad i invested my time in reddit last year. We must be cautious of overheating though, signs of a bubble or a possible subreddit crisis.
Running a DB on a single spindle, and they have performance problems?<p>I couldn't imagine why.<p>2 TB OMG, thats almost a decent sized SQL Server instance. Yeah, it should take about an hour or two to replicate. I'm assuming they have a 10Gb enet on their DB server.