The meat in the comments of Blaine Cook's blog entry:<p><a href="http://romeda.org/blog/2008/05/scalability.html#1401411552478169860" rel="nofollow">http://romeda.org/blog/2008/05/scalability.html#140141155247...</a><p>"Scaling Twitter as a messaging platform is pretty easy. See Mickaël Rémond's post on the subject. Scaling the archival, and massive infrastructure concerns (think billions of authenticated polling requests per month) are not, no matter what platform you're on. Particularly when you need to take complex privacy concerns into account."<p>Sounds like they are having the kinds of problems that Friendster had years back.<p>How come sites like MySpace don't have these issues? They also seem to have a pretty complex social graph.
I have to admit that I am lacking in clue about Twitter.<p>Are they handling more than 64GB of user-generated data per hour? If not, why not just store everything into RAM on a big 128GB RAM server and query that?
I am having an impossible time coming up with a unbiased and substantiated opinion on this issue. On the one hand, we know nothing about Twitter's architecture other than they're (to some extent) using a language that has a notoriously under-optimized interpreter and a framework with reported scaling issues (unless you do a lot of hacking to it).<p>On the other hand, there's gotta be something fishy going on in Twitter-ville. Although I agree "the idea that building a large scale web application is trivial or a solved problem is simply ridiculous," by now Twitter should have enough performance data to know exactly which part of the process is causing the high-load issues they're having. If we are to assume that after each outage they at least "throw more hardware at it," then, theoretically, it's a problem that horizontal scaling cannot solve and the issue is deep-rooted in the system -- somewhere in the basic architecture of it.<p>Is it Ruby? RoR? Poorly optimized queries? Improper caching? Lack of domain knowledge? Leprechauns? I don't know, neither does anyone else outside of Twitter... but I guess speculation can be entertaining.
I don't get why Twitter doesn't scale. It's just webmail, but with smaller messages and a simpler UI. Here's how twitter should work: every user should have a list of users following them. When they tweet, each follower gets a copy of that message in their personal inbox. A copy is also attached to the tweeter's account, so new followers can suck that copy in when they start following them.<p>That's it. Now, sending a message takes O(n) (n=followers) time, which is really cheap. On my machine, it takes about a second to create and sync 40,000 files (there's not much data, so replicating this via NFS wouldn't be that expensive either). With that out of the way, all you have to do is ls your "twitter directory" to see all of your friend's messages. This is another incredibly cheap operation. It's easy to distribute, and there's no locking.<p>Anyway, just look at the mail handling systems at huge universities and corporations. They scale fine, and they're much more complicated than twitter. Twitter is just a subset of e-mail, so it should be implement that way, not as a "SELECT * FROM tweets WHERE user IN (list, of, followers) ORDER BY date". That is the wrong approach because it makes reads (very common) expensive and writes (very uncommon) cheap. That's why twitter doesn't scale.
Of course all of you Hacker News Monday morning quarterbacks have taken a website from launch to Twitter's current level of traffic, right? If so, please let us know which site that was so we can compare your experience and your decisions against those of the Twitter team. If not, perhaps you should go and do that first before sounding off on Twitter. No, I'm not a friend of Blane Cook's, nor am I a Twitter apologist -- I don't even use the service. But I do respect startup founders and builders far more than their critics, and I know that website architecture -- like most things -- is always easier in hindsight.
For perspective, please read the Hot or Not story in "Founders at Work," then read Teddy Roosevelt's "Man in the Arena" quotation that Arrington is so fond of. (And yes I realize that reference is ironic given that Arrington started the pile-on against Blaine Cook. Arrington should try to remember that "it's not the critic that counts.")
It's doubt it's the language. Sure, you might get more efficient throughput with another framework or language..but I think the core problem lies in how many polling API connections they have. If their API is based on their core DB and not polling off a read only copy..they have a serious design flaw.<p>API conections, given their nature of having to do a security check each time should be based on a slave copy read-only db which is near-realtime, or potentially dirty read but who cares, its for their API. The security lookup should be cached since you rarely change API security accounts more than once..if you do, then you update the caches.. etc.<p>Granted I'm no expert, but it just sounds like they're overloading their DB with polling.<p>Their web pages for each user should be cached since they're not really hit as much as the api or rss<p>I dont' care what anyone says, having THAT many API connections constantly polling their DB along with what blane said about having to do authentication requests on every singel hit is taxing on any setup you can put together.<p>Oh and if they have a single table holding all their tweets...big issues there. I'd have a 26 "tweet" tables, one for each letter of the alphabet and my switch in the business layer. Then simply ship tweets older than a month, or patition based on that.
Common usage of word "scale" is that things continue to work well as you add load. Whether that's done by having the software fast enough to handle everything one computer, adding servers or adding pet monkeys is not important to the user.<p>Responding to reasonable complaints by using a different definition of word "scale," makes for a weak argument.
The web frontend could just be a static list of your contacts, with ajaxy grabbing the tweets from each. This way, you could cache each users tweets and avoid the issue of each user's page being different.<p>That's just one idea. I'm not really an expert.
I have several questions, because I feel there should be a better system approach to this twitter problem<p>is there any publicly available documentation for twitter's architecture?<p>did they use consultant help, did they contact SUN, IBM, Oracle or any other respected consultant when they started facing those problems?<p>i recommend you watch this video: <a href="http://www.infoq.com/presentations/qcon-voca-architecture-spring" rel="nofollow">http://www.infoq.com/presentations/qcon-voca-architecture-sp...</a><p>really ... we need to have a look at twitter architecture before we discuss this further
Unless you know what you need to scale to, you can't even begin to talk about scalability. How many users do you want your system to handle? A thousand? Hundred thousand? Ten million? Here's a hint: the system you design to handle a quarter million users is going to be different from the system you design to handle ten million users.<p><a href="http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html" rel="nofollow">http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y....</a>
If he can't scale a simple message passing system, what can he scale then? It's not that they are doing rocket science at twitter. He probably was the wrong person for the job.<p>In the good old days, every message over ICQ was sent over their servers, and they probably had more messages to handle than twitter does nowadays. And there wasn't a single problem then.
i think the problem with twitter is that they did not use a real time architecture, like email, irc, or some other messaging platform. instead i think they write each page dynamically off of a db call based on who subscribed where. ie, you would have to start from scratch to scale it. i think the problem is that now that twitter, let the cat out of the bag, they are now chasing problems on an old architecture, and trying to scale that, while also trying to build a new version. it suppose its like running two tech companies.