Hey fellow hackers,<p>I am a programmer, but most of my work doesn't involve the web. The most I have done with web design is an EC2 instance running a LAMP stack. I am working on a project right now, but my concern is that I am limited by my 1 server experience? I know the gist of scaling, (traffic routing, memcaching, sharding) but I don't really know how to go about setting it up. How do you even predict if you need to scale like that? Can scaling be done in a modular fashion? I want to learn about web scale :) Any pointers in the right direction would be greatly appreciated. If you could mention the full stack solution and not just one or two technologies that would be good.<p>Thanks in advance!
The best way to learn web scalability is to work at companies which have serious scaling problems. They tend to both a) develop a lot of organic knowledge about scaling, including by pushing the state of the art forward and b) they almost definitionally have more money than God, and as a consequence can spend copious money on hiring and training.<p>One thing companies with serious scaling problems will do is send you to places like e.g. JavaOne (and many, many more specialized conferences besides), where in the presentations you'll listen to e.g. LinkedIn talk about how to make writes immediately consistent to the originating user and then fan them out to other users over the course of the next few minutes, and what this requirement does to a three-tier architecture.<p>There's also a lot online. High Scalability will get mentioned, and I tend to keep my ears to the ground for interesting conferences and then see if they post videos and/or slide decks (SlideShare is wonderful for these).<p>OK, that's the answer you want. Here's the answer you need: you are overwhelmingly unlikely to have scaling problems. Single servers are big-n-beefy things, even single VPSes. Depending on what you're doing with it, a single EC2 instance running a non-optimized LAMP stack can easily get you to hundreds of thousands of users served, and/or millions of dollars of revenue. The questions of a) where do I get hundreds of thousands of users? or b) how do I convince businesses to pay $100k a month for my software? are of much more immediate and pressing concern to you than how you would take that very successful business to 10x or 100x its hypothetical size.<p>(Much like "I'm concerned my experience with managing a household budget will not prepare me for the challenges posed by a 9 figure bank account after I exit. How will I ever cope? Where can I learn about the tax challenges and portfolio allocation strategies of the newly wealthy?" the obvious answer sounds like "You'll hire someone to do that for you. Now, get closer to exiting than 'Having no product and no users' and worry about being a hundred-millionaire some other day.")
High Scalability is a great blog with lots of case studies and best practices.<p><a href="http://highscalability.com/start-here/" rel="nofollow">http://highscalability.com/start-here/</a>
Scalabilty is like sex. You can spend years thinking about it, reading about it, watching the videos and even practicing on your own...but nothing beats actually doing it. And the more you work at scale, the better you get at anticipating and troubleshooting the problems that come with increasing demands.
Slightly dated, but I recommend Cal's book on the lessons we learned scaling Flickr, <a href="http://www.amazon.com/Building-Scalable-Web-Sites-Applications/dp/0596102356" rel="nofollow">http://www.amazon.com/Building-Scalable-Web-Sites-Applicatio...</a><p>Also remember that premature scaling is one of the leading causes of failure.
I'd like to recommend "Web Operations" by Allspaw<p><a href="http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440" rel="nofollow">http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/14...</a>
Well its easier to say what not to do than what to do, so here's some of that:<p>- cargo-cult buzzphrase/keyword engineering: saying "we should use <thing>, <cool_company_a> and <cool_company_b> use <thing>". If you find yourself debating technology where entire paragraphs go by without real metrics, then you're really social-signaling not engineering.<p>- scaling for the sake of scaling: there is a vast grab bag of available tools and techniques to scale, but for any given business the right answer is to take a pass on most of it. the overwhelming majority will either be unnecessary or even flat out counter-productive. the easiest way to separate what to scale vs. what to hack/ignore is asking yourself "is this what my users/customers love/pay me for, or something a grad student would love to spend a semester on".<p>- timing context matters, a lot: whats "right" for scaling a young company with very few engineers (and probably zero ops pros) will be a different answer than whats "right" for scaling a millions-of-users/millions-in-revenue company. whats right for twitter is <i>wrong</i> for a young company, almost by definition.<p>- "throw hardware at it" (the wrong way): throwing hardware at a problem is <i>the best</i> way to solve scaling problems, but only if you do it right. stateless request-response across large pools of identical servers scales better than anything else. however "give it a dedicated server" leaves you with a mine field of one and two-off setups that only scales into the dozens-of-servers before it starts to choke your business.
Udacity's Web Application Engineering course is pretty good.<p><a href="http://www.udacity.com/view#Course/cs253/CourseRev/apr2012/Unit/366003/Nugget/525001" rel="nofollow">http://www.udacity.com/view#Course/cs253/CourseRev/apr2012/U...</a><p>Particularly, Unit 6 and Unit 7 talk about scaling.<p>The course uses GAE, but the concepts apply everywhere. The professor is Steve Huffman who has practical experience scaling Reddit and now Hipmunk.
Learn by doing. And don't worry too much about it until the need comes. Luckily, the problem itself implies you're doing well, so the effort of dealing with it when the time comes will pay for itself.
Grow a site to tens of millions of users. Spend nights and weekends for several years keeping it up.<p>:) Otherwise get a job at some company that is doing that presently.
buro9 answered this question very well, and I just link to his comment whenever it comes up again: <a href="http://news.ycombinator.com/item?id=2249789" rel="nofollow">http://news.ycombinator.com/item?id=2249789</a>
I might be showing a lack of knowledge here, because I doubt I'm as well-equipped to answer this as others on this thread, but here are my thoughts.<p>Learn functional programming, because it's going to become important when parallelism becomes important. Learn enough about concurrency and parallelism to have a good sense of the various approaches (threads, actors, software transactional memory) and the tradeoffs. Learn what databases are and why they're important. Learn about relational (SQL) databases and transactions and ACID and what it means not to be ACID-compliant (and why that can be OK). Learn a NoSQL database. I frankly don't much like most of them, but they solve an important problem that relational databases need a lot more massaging to attack.<p>All this said, I think focusing on "web scalability" is the wrong approach. Focus on the practical half of computer science rather than "scalability". I feel like "scaling" is, to a large degree, a business wet dream / anxious nightmare associated with extremely public, breakout success (or catastrophic, embarrassing failure) and that most people would do better just to learn the fundamentals than to have an eye on "scaling" for it's own sake.<p>Bigness is technology isn't innately good. All else being equal, it's very, very bad. Sometimes the difficulty is intrinsic and that's a good thing, because it means you're solving a hard problem, but difficulty for difficulty's sake is a bad pursuit. Software is already hard; no point in making it harder.<p>Finally, learn the Unix philosophy and small-program architecture. Learn how to design code elegantly. Learn why object-oriented programming (as seen over the past 25 years) is a massive wad of unnecessary complexity and a trillion-dollar mistake. Then learn what OOP looks like when done right, because sometimes it really delivers. For that, most scaling problems come from object-oriented Big Code disasters that are supposed to perform well on account of all the coupling, but end up being useless because no one can understand them or how they work.<p>Learn the fundamentals so you <i>can</i> scale, but don't scale prematurely for its own sake.<p>I could be wrong, but I don't think I am.