TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Google Finds NUMA Up to 20% Slower for Gmail and Websearch

78 点作者 streeter将近 12 年前

7 条评论

scott_s将近 12 年前
One quibble: the author at High Scalability refers to the authors of the paper collectively as "Google," but the lead authors, Lingjia Tang and Jason Mars, are professors at UC San Diego. Of course, they must have collaborated with Google and they may have done the work while doing summer internships in 2011 (CVs are at <a href="http://www.lingjia.org/" rel="nofollow">http://www.lingjia.org/</a> and <a href="http://jasonmars.org/" rel="nofollow">http://jasonmars.org/</a>).
评论 #5795115 未加载
评论 #5797000 未加载
评论 #5797550 未加载
mtdewcmu将近 12 年前
I'm having a little trouble making sense of this:<p>"For example, bigtable benefits from cache sharing and would prefer 100 % remote accesses to 50% remote. Search-frontend prefers spreading the threads to multiple caches to reduce cache contention and thus also prefers 100 % remote accesses to 50% remote."<p>Let me see if I've got this straight:<p>* bigtable benefits from scheduling related threads on the same cpu so they can share a cache, I'm guessing because multiple threads work on the same data simultaneously<p>* search benefits from having its threads spread over many cpus, probably because the threads are unrelated to each other and not sharing data, so they like to have their own caches<p>I'm not sure I understand how this relates to NUMA, or why remote accesses are ever a good thing. Maybe it requires a more sophisticated understanding of computer architecture than what I have.
评论 #5795614 未加载
评论 #5795502 未加载
rpearl将近 12 年前
Given Google's ability to obtain processors before they are available to the public, and given that this paper refers to AMD's Barcelona processors, the results published here are probably approximately seven years out-of-date, and it's not clear whether they're still relevant now.
mckilljoy将近 12 年前
I like reading these analyses, although I'm afraid headlines like this oversimplify things and give off the wrong impression. There isn't anything inherently wrong with NUMA, it just isn't useful in this situation.<p>No technology is a 'silver bullet'. Every workload has a different set of considerations that require a different set of technology to optimize.
评论 #5796476 未加载
chad_walters将近 12 年前
The title is not just misleading -- it is just plain wrong.<p>NUMA was 15% better for Gmail and 20% better for the Web search frontends, as indicated by the reductions (improvements) in CPI for these workloads.<p>There were some workloads where NUMA did degrade performance, such as BigTable accesses (12% regression).
lallysingh将近 12 年前
Specifically: "in multicore multisocket machines, there is often a tradeoff between optimizing NUMA performance by clustering threads close to the memory nodes to increase the amount of local accesses and optimizing for cache performance by spreading threads to reduce the cache contention"<p>I.e. the performance benefit from socket-local memory accesses may not be worth having all the threads using that memory on that socket's CPUs, because they'll each get too little a share of the cache.
hollerith将近 12 年前
Up to 20% slower than what?<p>(Than SMP systems, I guess, but the OP does not say.)
评论 #5795118 未加载