TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Reddit: Lessons Learned From Scaling To 1 Billion Pageviews A Month

229 点作者 jpmc超过 11 年前

18 条评论

peterwwillis超过 11 年前
You notice how in these recaps, all you read about is &quot;I learned that X does Y&quot;? They don&#x27;t seem to have much in the way of lessons to take heed of for all situations. It&#x27;s more like, &quot;If you use this specific key&#x2F;value store, tweak the thingimabob to sassyfraz to make sure your dingo does wibblydong.&quot; So if my platform doesn&#x27;t use that store, your lesson is pointless. If it&#x27;s a problem with an application, it&#x27;s great that you&#x27;re pointing it out, but if it was just oversight by lazy engineers, leave it out.<p>Then there&#x27;s the wise lessons on general topics, like the idea that you should &quot;wait until your site grows so you can learn where your scaling problems are going to be&quot;. I&#x27;m pretty sure we <i>know</i> what your scaling problems are going to be. Every single resource in your platform and the way they are used will eventually pose a scaling problem. Wait until they become a problem, or <i>plan</i> for them to become a problem?<p>I&#x27;m not that crazy. It really doesn&#x27;t take <i>a lot</i> of time to plan ahead. Just think about what you have, take an hour or two and come up with some potential problems. Then sort the problems based on most-imminent most-horrible factors and make a roadmap to fix them. I know nobody likes to take time to reflect before they start banging away, but consider architectural engineering. Without careful planning, the whole building may fall apart. (Granted, nobody&#x27;s going to die when your site falls apart, but it&#x27;s a good mindset to be in)
评论 #6279596 未加载
评论 #6281133 未加载
gbog超过 11 年前
&gt; Stay as schemaless as possible. It makes it easy to add features. All you need to do is add new properties without having to alter tables.<p>And at the same time they use and praise Postgres a lot, so it cannot be about NoSQL.<p>I am wondering what they mean exactly. From my own tendency, it should mean use a few very big and narrow tables in the form of &quot;who - do - what - when - where&quot;, eg &quot;userA - vote up - comment1 - timestamp - foosubreddit&quot;, and also &quot;userB - posted - link1 - timestamp - barsubreddit&quot;<p>Then in the same table you get kinda all events happening in the site, and you are somewhat schemaless, in the sense that adding a new functionality do not require schema change.<p>If someone with inner insight can confirm this is not too far from what reddit team meant, I&#x27;d appreciate.
评论 #6281541 未加载
评论 #6281185 未加载
na85超过 11 年前
Reddit is an interesting case; they seem to have almost unlimited amounts of user good will. Case in point: I get the &quot;you broke reddit&quot; pageload failure message an awful lot and I&#x27;m sure others do too. How many other sites have userbases that would tolerate such a high number of errors?
评论 #6283278 未加载
评论 #6283855 未加载
continuations超过 11 年前
&gt; For comments it’s very fast to tell which comments you didn’t vote on, so the negative answers come back quickly.<p>Can you get into more details about how this is used? If reddit needs to display a page that has 100 comments, do they query Cassandra on the voting status of the user on those 100 comments?<p>I thought Cassandra was pretty slow in reads (slower than postgres) so how does using Cassandra make it fast here?
评论 #6280767 未加载
评论 #6281536 未加载
jzelinskie超过 11 年前
This looks like a summary of the talk on InfoQ on the subject:<p><a href="http://www.infoq.com/presentations/scaling-reddit" rel="nofollow">http:&#x2F;&#x2F;www.infoq.com&#x2F;presentations&#x2F;scaling-reddit</a>
评论 #6278467 未加载
评论 #6278755 未加载
评论 #6280074 未加载
727374超过 11 年前
&quot;Treat nonlogged in users as second class citizens. By always giving logged out always cached content Akamai bears the brunt for reddit’s traffic. Huge performance improvement. &quot;<p>This is the lowest of low hanging fruit. Many people don&#x27;t realize it but a ton of huge media sites use Akamai to offload most of their &quot;read-only&quot; traffic.
评论 #6281545 未加载
human_error超过 11 年前
&gt; Used the Pylons (Django was too slow), a Python based framework, from the start<p>This isn&#x27;t quite right. It was web.py at the beginning. They have started using Pylons after Conde Nast acquisition.
评论 #6278273 未加载
评论 #6279231 未加载
评论 #6281568 未加载
评论 #6280507 未加载
falcolas超过 11 年前
I can certainly appreciate what Reddit has accomplished, but the thought of losing the abilities of a full RDBMS for a key-value store makes my hair stand on end.<p>I&#x27;ve yet to find schema changes limiting in my ability to code against a DB (and I use MySQL, which is one of the most limiting in this regard). Plus, I appreciate the ability to offload things like data consistancy and relationships to the database. I understand, however, where others might not feel the same way.
评论 #6278643 未加载
chrismealy超过 11 年前
<i>Queues were a saviour. When passing work between components put it into a queue. You get a nice little buffer.</i><p>What does reddit use for queuing?
评论 #6278871 未加载
评论 #6279728 未加载
评论 #6281550 未加载
WestCoastJustin超过 11 年前
This appears to a summery of an InfoQ presentation, which was discussed about two weeks ago @ <a href="https://news.ycombinator.com/item?id=6222726" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=6222726</a>
jjwiseman超过 11 年前
&quot;Do not keep secret keys on the instance.&quot; I&#x27;m curious how people deal with this--what approaches do you use?
评论 #6280319 未加载
misiti3780超过 11 年前
Is it common for people to use PostGres for a key-value store in production (rather than redis)?. This is the first time I have heard of it, and I am just starting to use PostGres now, so I was a bit surprised
评论 #6278484 未加载
评论 #6278474 未加载
评论 #6279279 未加载
评论 #6279598 未加载
评论 #6278488 未加载
评论 #6279061 未加载
exhaze超过 11 年前
Jeremy also gave a great Airbnb tech talk on this topic:<p><a href="http://nerds.airbnb.com/reddit-netflix-and-beyond-building-scalable-and-reliable-architectures-in-the-cloud/" rel="nofollow">http:&#x2F;&#x2F;nerds.airbnb.com&#x2F;reddit-netflix-and-beyond-building-s...</a>
callmeed超过 11 年前
Can someone elaborate&#x2F;clarify this:<p><i>&gt; Users connect to a web tier which talks to an application tier.</i><p>So, I&#x27;m assuming the web tier is nginx&#x2F;haproxy and the application tier is Pylons.<p>Are the 240 servers mentioned all running <i>both</i> the web tier and the app tier?
评论 #6281028 未加载
评论 #6281555 未加载
chum超过 11 年前
<i>Recode Python functions in C</i><p>From a security standpoint, this sounds like a bad idea
评论 #6279223 未加载
评论 #6281566 未加载
ivanbrussik超过 11 年前
Just out of curiosity what does &quot;stay as schemaless&quot; as possible that did not read right?
skeletonjelly超过 11 年前
jedberg - you speak of automation, did you use anything (or is there anything in use currently) that handles auto scaling for EC2? puppet&#x2F;chef&#x2F;ansible etc? Or was this all done by hand?
srj55超过 11 年前
hmm...no love for django here.
评论 #6278584 未加载
评论 #6280643 未加载
评论 #6278436 未加载