TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Facebook's Top Open Data Problems

180 点作者 huangwei_chang超过 10 年前

10 条评论

EricBurnett超过 10 年前
I strongly dislike Facebook the product, and to lesser extent Facebook the company, but I'm continually impressed with Facebook's approach to engineering in the open. I find this an interesting dichotomy. Would I want to work there? I still don't think so, but my opinion on that front is getting less strong over time.
评论 #8561786 未加载
评论 #8562031 未加载
评论 #8560556 未加载
评论 #8559178 未加载
评论 #8560132 未加载
ransom1538超过 10 年前
I had a really great time talking to the Facebook engineers during my interviews there. The main pattern I noticed was Harvard (i was applying in management). Even more so, the guys interviewing me were extremely talented and smart. What always weirded me out was... the problems they work on are not that difficult. Once you grasp sharding and operations, you are pretty much set. These guys are <i>not</i> the Manhattan project. The true hard problems in their space: developing their own mobile hardware, keeping teens engaged, pushing the boundaries of design, losing tracking systems in mobile, etc; they don&#x27;t face head on. Moving petabytes around or caching lots of things in memcache - my roomate and I could do with an aws account and a few beers. Memcache for god sakes is what 300 lines of C?
评论 #8561315 未加载
评论 #8561397 未加载
mandeepj超过 10 年前
Can any body throw some light on how facebook&#x27;s database is designed? I am sure it will be an interesting read.<p>I was reading somewhere sometime back that each user at fb has its own database. I think that is not possible.<p>edit: I am googling now again on this topic. First link found is <a href="http://www.quora.com/What-is-Facebooks-database-schema" rel="nofollow">http:&#x2F;&#x2F;www.quora.com&#x2F;What-is-Facebooks-database-schema</a>
评论 #8560066 未加载
评论 #8559135 未加载
crazypyro超过 10 年前
This is slightly off topic, but has any experienced an increase in &quot;fake&quot; toasts from facebook mobile? It seems if I haven&#x27;t used facebook mobile in a few days or I don&#x27;t respond to their toasts about very minor people in my life uploading a photo, I tend to start getting toasts that say &quot;You have 5 notifications, 3 pokes and 2 messages.&quot;, then I open the app and it takes me to an unknown error page.<p>Am I being too cynical in thinking that Facebook is intentionally misleading its users in an attempt to bump up their metrics? It interests me that they are seeing jumps in their mobile users (and consequently, ad sales) at the same time that I have been receiving more notifications than ever. Interestingly, the slowdown in fake toast notifications coincided with their quarterly earnings report that show mobile ads accounting for an increasingly large portion of revenue and also mentions an increase in mobile user usage.<p>Comparing Q1 with Q2 with Q3, Q2-Q3 showed double the increase in ad revenue percent from mobile (59% to 62% to 66%). Maybe this is just all anecdotal evidence, but it seems like these sort of fake notifications should either not be sent out (failure of the system that keeps track of what user receives what toasts) or there was a conscious effort to send these notifications....
评论 #8559602 未加载
评论 #8559549 未加载
评论 #8559832 未加载
beagle3超过 10 年前
Something does not add up about hive: They say it has 300 PB, and it generates 4PB per day - which means, at this rate, all data was generated within the last 75 days.
评论 #8558930 未加载
评论 #8559457 未加载
评论 #8559086 未加载
Cakez0r超过 10 年前
I&#x27;m really curious how they handle paging if they&#x27;re only using memcached. E.G. If a a photo node has 10,000 comment nodes (and thus 10,000 edges linking the photo to the comments), chances are you only want to display the most recent 50 comments. Are all of the 10,000 edges stored in memcached under one key and then paged on the application servers? Are they stored in chunks under multiple keys? How is cache consistency maintained if somebody makes a new comment (maintaining the time ordering seems tricky and expensive)?<p>This is a problem I&#x27;m actively trying to solve for a project, so if somebody knows the answer, please get in touch!
评论 #8559576 未加载
swah超过 10 年前
I&#x27;d like to use this opportunity to ask: is it a technical limitation that users still can&#x27;t search their timeline?
mmmooo超过 10 年前
So ~650M daily active users..4PB of data warehouse created each day, that means ~7MB of new data on each active user per day. Given that its data warehouse, I&#x27;m going to guess its not images, seems like a lot to me. I guess it shouldn&#x27;t surprise anyone that every interaction on and off the site, is heavily tracked.
评论 #8560130 未加载
评论 #8559436 未加载
评论 #8560717 未加载
doque超过 10 年前
<i>3. Hive is Facebook&#x27;s data warehouse, with 300 petabytes of data in 800,000 tables. Facebook generates 4 new petabyes of data and runs 600,000 queries and 1 million map-reduce jobs per day.</i><p>So 4 PB per day, but only 300 PB total?
评论 #8559499 未加载
Thaxll超过 10 年前
Still using Memcache wow.
评论 #8559311 未加载
评论 #8559258 未加载