I'm building a new service that allows users to subscribe to each other's feeds. In real time when publishers and consumers are both online (i.e. me and my friends) I use a message queue to take the published post and distribute it to my friends. However, where I'm struggling is how do I handle those friends that are not currently online? I can simply save new posts to the DB and pull them out when that user comes online, however I don't know the best way to retrieve an older post, check the friends its creator has and then send that message to all of them if they are online. If not don't send them anything. I can do it by storing more stuff in the DB but it becomes very difficult and time consuming to do all these table look ups just to figure out what message needs to go where.<p>So I'm wondering how does facebook or twitter handles thousands of friends posting tens/hundreds of messages a day, and how do they keep it all synchronized?<p>Thanks,
Roy
Twitter, LinkedIn and Facebook all make some bits and pieces of their system available as open-source. You might particularly want to look at Twitter's FlockDB[1], LinkedIn's Kafka[2], and/or Facebook's Scribe[3].<p>[1]: <a href="https://github.com/twitter/flockdb" rel="nofollow">https://github.com/twitter/flockdb</a><p>[2]: <a href="http://sna-projects.com/kafka/" rel="nofollow">http://sna-projects.com/kafka/</a><p>[3]: <a href="https://github.com/facebook/scribe" rel="nofollow">https://github.com/facebook/scribe</a><p>FlockDB is designed to deal with the "social graph" connection stuff (who follows who, etc.) and is optimized for that sort of thing. Kafka and Scribe are more about delivering message widely. Poke around, look at their code, chase down and associated papers, and you should be able to sort out an approach... even if you don't adopt any of those codebases directly. FWIW, I'm looking at integrating FlockDB for some stuff I'm working on, but I haven't gotten very far down that path yet, so I don't have a lot more to offer than this.