I found this part interesting: "F4 currently stores over 65PB of logical data and saves over 53PB of storage."<p>It looks like most of the savings here is due to optimization of the replication level of storage, according to the usage patterns of the data.<p>Also found it interesting that it doesn't talk about CDN usage.
Excellent article. I like the architecture explanation. I also like the simplicity of the description, makes it really easy to understand the descision making process.<p>I think it is a very logical solution. Naturally when I think of facebook, its the freshness of the data that is important for me, as a user I like to know whats going on now, as opposed to older timeframes. I think it seems quite logical that it is much more optimal to manage their data this way.
On this topic, where do you guys learn more about architecture of large conglomerates of web services and the latest trends?<p>I read High Scalability and the occasional company blog. Is there anything else out there that might be even better? Blogs, books, forums, doesn't matter.
Great post. Thanks this is an awesome blog post, I love it when someone summarizes the good stuff :D!
I wish someone would do the same for the AWS cloud papers.<p>Wow TIL what Tao is and what it can do.<p>This part is very sexy to me:
Facebook's new architecture splits the media into two categories:
1) hot/recently-added media, which is still stored in Haystack, and
2) warm media (still not cold), which is now stored in F4 storage and not in Haystack.
the title of his blog"Facebook's software architecture" is misleading.
actually these papers are talking about data storage and database. not "software architecture."
Though, his blog is informative and consistent. Love it!