If anybody is interested in random Twitter internal stuff I might bang up a medium post one day. I was neck deep in the infra side of things for years and have all sorts of funny stories.<p>Our managed hosting provider wouldn't let us use VPNs or anything that allowed direct access to the managed network they provided, but we wanted to make internal only services that were not on the internet so I setup a simple little system that used DNS to point to private space in the office and a SSH tunnel to forward the ports to the right places. Worked great, but over time the internal stuff grew up, and our IT team refused to let me have a server in the office so it was all running of a pair of mac mini's. We called them the "load bearing mac minis" since basically 90% of the production management traffic went over the SSH tunnels they hosted. =)
2010: "At any moment, Justin Bieber uses 3% of our infrastructure. Racks of servers are dedicated to him"<p><a href="https://gizmodo.com/5632095/justin-bieber-has-dedicated-servers-at-twitter" rel="nofollow">https://gizmodo.com/5632095/justin-bieber-has-dedicated-serv...</a>
If you want to read more about activity feeds there are a ton of papers listed here: <a href="https://github.com/tschellenbach/stream-framework" rel="nofollow">https://github.com/tschellenbach/stream-framework</a>
I've been working on this stuff for years. Recently I've also enjoyed reading Linkedin's posts about their feed tech. Three are a few different posts but here's one of them: <a href="https://engineering.linkedin.com/blog/2016/03/followfeed--linkedin-s-feed-made-faster-and-smarter" rel="nofollow">https://engineering.linkedin.com/blog/2016/03/followfeed--li...</a><p>Scaling a social network is just inherently a very hard problem. Especially if you have a large userbase with a few very popular users. Stackshare recently did a nice blogpost about how we at Stream solve this for 300 million users with Go, RocksDB and Raft: <a href="https://stackshare.io/stream/stream-and-go-news-feeds-for-over-300-million-end-users" rel="nofollow">https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...</a><p>I think the most important part is using a combination of push and pull. So you keep the most popular users in memory and for the other users you use the traditional fanout on-write approach. The other thing that helped us scale was using Go+RocksDB. The throughput is just so much higher compared to traditional databases like Cassandra.<p>It's also interesting to note how other companies solved it. Instagram used a fanout on write approach with Redis, later on Cassandra and eventually a flavor of Cassandra based on RocksDB. They managed to use a full fanout approach using a combination of great optimization, a relatively lower posting volume (compared to Twitter at least) and a ton of VC money.<p>Friendster and Hyves are two stories of companies that didn't really manage to solve this and went out of business. (there were probably other factors as well, but still.) I also heard one investor mention how Tumblr struggled with technical debt related to their feed. A more recent example is Vero that basically collapsed under scaling issues.
I haven't seen anyone touch on this, but I remember reading about this in Data Intensive Applications[1]. The way that they solved the celebrity feed issue was to decouple users with high amounts of followers from normal users.<p>Here is a quick excerpt, this book is filled to the brim with these gems.<p>> The final twist of the Twitter anecdote: now that approach 2 is robustly implemented,Twitter is moving to a hybrid of both approaches. Most users’ tweets continue to be fanned out to home timelines at the time when they are posted, but a small number of users with a very large number of followers (i.e., celebrities) are excepted from this fan-out. Tweets from any celebrities that a user may follow are fetched separately and merged with that user’s home timeline when it is read, like in approach 1. This hybrid approach is able to deliver consistently good performance.<p>Approach 1 is a global collection of tweets, the tweets are discovered and merged in that order.<p>Approach 2 involves posting a tweet from each user into each follower's timeline, with a cache similar to how a mailbox would work.<p>[1] <a href="https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321/ref=sr_1_1?ie=UTF8&qid=1527213498&sr=8-1&keywords=data+intensive+application&dpID=51PjhtI9VRL&preST=_SX218_BO1,204,203,200_QL40_&dpSrc=srch" rel="nofollow">https://www.amazon.com/Designing-Data-Intensive-Applications...</a>
This isn't shocking - Twitter was notorious for being held together with Scotch tape technically.<p>Honestly this hands-on approach is an impressive example of doing things that don't scale.
There was a fun high-scalability article around their 'fan-out' approaches to disseminating tweets by popular users etc : <a href="http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html" rel="nofollow">http://highscalability.com/blog/2013/7/8/the-architecture-tw...</a> .<p>When I was working on something with similar technical requirements I also came across this paper (<a href="http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf" rel="nofollow">http://jeffterrace.com/docs/feeding-frenzy-sigmod10-web.pdf</a>) that outlined the approach in a more 'formal' manner.
Ah, I read that Twitter thread a few days (weeks?) ago and it was much longer. As far as I remember, it started with someone asking Twitter ops people, former and current, to share some stories about things that went spectacularly wrong.<p>It contained a lot of Twitter ops battle stories, some very interesting. I was pretty impressed to read Twitter internals in the given level of detail, but now it seems that the thread that held them all together is protected (probably didn't expect it to be so popular, or just wanted to continue more privately).
In the early days of Twitter the "fail whale" was so common it got assimilated into culture as a term to use for anytime any site gets overloaded. Nowadays it seems like that term is "hugged to death"<p><a href="https://www.theatlantic.com/technology/archive/2015/01/the-story-behind-twitters-fail-whale/384313/" rel="nofollow">https://www.theatlantic.com/technology/archive/2015/01/the-s...</a>
Missed that this went to the front page! I will answer questions if I can.<p>I am now CEO @ <a href="https://fauna.com/" rel="nofollow">https://fauna.com/</a>, making whales and fails a thing of the past for everybody. We are hiring, if you want to work on distributed databases.
Fun fact, Twitter's feed is still kinda broken. If you visit the site after being gone for a week or so it tells your timeline is empty. It recovers after a few minutes, but its still a pretty poor user experience.
What frameworks would you use to handle such an steep growth curve? Most startups I know of start of with rails or the like - and obviously they couldn't handle the strain. So what would you use?
> Jason Goldman, who served as Vice President of Product at Twitter between 2007 and 2010, responded to Weaver’s tweets with the observation that early Twitter was “held together by sheer force of will.”<p>I would dispute that, I don't think they can take that much credit. Regardless of their "sheer force of will" the site was down very, very frequently.