Reddit: Lessons Learned From Scaling To 1 Billion Pageviews A Month

229 pointsby jpmcover 11 years ago

18 comments

You notice how in these recaps, all you read about is "I learned that X does Y"? They don't seem to have much in the way of lessons to take heed of for all situations. It's more like, "If you use this specific key/value store, tweak the thingimabob to sassyfraz to make sure your dingo does wibblydong." So if my platform doesn't use that store, your lesson is pointless. If it's a problem with an application, it's great that you're pointing it out, but if it was just oversight by lazy engineers, leave it out.Then there's the wise lessons on general topics, like the idea that you should "wait until your site grows so you can learn where your scaling problems are going to be". I'm pretty sure we know what your scaling problems are going to be. Every single resource in your platform and the way they are used will eventually pose a scaling problem. Wait until they become a problem, or plan for them to become a problem?I'm not that crazy. It really doesn't take a lot of time to plan ahead. Just think about what you have, take an hour or two and come up with some potential problems. Then sort the problems based on most-imminent most-horrible factors and make a roadmap to fix them. I know nobody likes to take time to reflect before they start banging away, but consider architectural engineering. Without careful planning, the whole building may fall apart. (Granted, nobody's going to die when your site falls apart, but it's a good mindset to be in)

评论 #6279596 未加载

评论 #6281133 未加载

gbogover 11 years ago

> Stay as schemaless as possible. It makes it easy to add features. All you need to do is add new properties without having to alter tables.And at the same time they use and praise Postgres a lot, so it cannot be about NoSQL.I am wondering what they mean exactly. From my own tendency, it should mean use a few very big and narrow tables in the form of "who - do - what - when - where", eg "userA - vote up - comment1 - timestamp - foosubreddit", and also "userB - posted - link1 - timestamp - barsubreddit"Then in the same table you get kinda all events happening in the site, and you are somewhat schemaless, in the sense that adding a new functionality do not require schema change.If someone with inner insight can confirm this is not too far from what reddit team meant, I'd appreciate.

评论 #6281541 未加载

评论 #6281185 未加载

na85over 11 years ago

Reddit is an interesting case; they seem to have almost unlimited amounts of user good will. Case in point: I get the "you broke reddit" pageload failure message an awful lot and I'm sure others do too. How many other sites have userbases that would tolerate such a high number of errors?

评论 #6283278 未加载

评论 #6283855 未加载

continuationsover 11 years ago

> For comments it’s very fast to tell which comments you didn’t vote on, so the negative answers come back quickly.Can you get into more details about how this is used? If reddit needs to display a page that has 100 comments, do they query Cassandra on the voting status of the user on those 100 comments?I thought Cassandra was pretty slow in reads (slower than postgres) so how does using Cassandra make it fast here?

评论 #6280767 未加载

评论 #6281536 未加载

jzelinskieover 11 years ago

This looks like a summary of the talk on InfoQ on the subject:<a href="http://www.infoq.com/presentations/scaling-reddit" rel="nofollow">http://www.infoq.com/presentations/scaling-reddit</a>

评论 #6278467 未加载

评论 #6278755 未加载

评论 #6280074 未加载

727374over 11 years ago

"Treat nonlogged in users as second class citizens. By always giving logged out always cached content Akamai bears the brunt for reddit’s traffic. Huge performance improvement. "This is the lowest of low hanging fruit. Many people don't realize it but a ton of huge media sites use Akamai to offload most of their "read-only" traffic.

评论 #6281545 未加载

human_errorover 11 years ago

> Used the Pylons (Django was too slow), a Python based framework, from the startThis isn't quite right. It was web.py at the beginning. They have started using Pylons after Conde Nast acquisition.

评论 #6278273 未加载

评论 #6279231 未加载

评论 #6281568 未加载

评论 #6280507 未加载

falcolasover 11 years ago

I can certainly appreciate what Reddit has accomplished, but the thought of losing the abilities of a full RDBMS for a key-value store makes my hair stand on end.I've yet to find schema changes limiting in my ability to code against a DB (and I use MySQL, which is one of the most limiting in this regard). Plus, I appreciate the ability to offload things like data consistancy and relationships to the database. I understand, however, where others might not feel the same way.

评论 #6278643 未加载

chrismealyover 11 years ago

Queues were a saviour. When passing work between components put it into a queue. You get a nice little buffer.What does reddit use for queuing?

评论 #6278871 未加载

评论 #6279728 未加载

评论 #6281550 未加载

WestCoastJustinover 11 years ago

This appears to a summery of an InfoQ presentation, which was discussed about two weeks ago @ <a href="https://news.ycombinator.com/item?id=6222726" rel="nofollow">https://news.ycombinator.com/item?id=6222726</a>

jjwisemanover 11 years ago

"Do not keep secret keys on the instance." I'm curious how people deal with this--what approaches do you use?

评论 #6280319 未加载

misiti3780over 11 years ago

Is it common for people to use PostGres for a key-value store in production (rather than redis)?. This is the first time I have heard of it, and I am just starting to use PostGres now, so I was a bit surprised

评论 #6278484 未加载

评论 #6278474 未加载

评论 #6279279 未加载

评论 #6279598 未加载

评论 #6278488 未加载

评论 #6279061 未加载

exhazeover 11 years ago

Jeremy also gave a great Airbnb tech talk on this topic:<a href="http://nerds.airbnb.com/reddit-netflix-and-beyond-building-scalable-and-reliable-architectures-in-the-cloud/" rel="nofollow">http://nerds.airbnb.com/reddit-netflix-and-beyond-building-s...</a>

callmeedover 11 years ago

Can someone elaborate/clarify this:> Users connect to a web tier which talks to an application tier.So, I'm assuming the web tier is nginx/haproxy and the application tier is Pylons.Are the 240 servers mentioned all running both the web tier and the app tier?

评论 #6281028 未加载

评论 #6281555 未加载

chumover 11 years ago

Recode Python functions in CFrom a security standpoint, this sounds like a bad idea

评论 #6279223 未加载

评论 #6281566 未加载

ivanbrussikover 11 years ago

Just out of curiosity what does "stay as schemaless" as possible that did not read right?

skeletonjellyover 11 years ago

jedberg - you speak of automation, did you use anything (or is there anything in use currently) that handles auto scaling for EC2? puppet/chef/ansible etc? Or was this all done by hand?

srj55over 11 years ago

hmm...no love for django here.

评论 #6278584 未加载

评论 #6280643 未加载

评论 #6278436 未加载