科技回声

7 条评论

scott_s将近 12 年前

One quibble: the author at High Scalability refers to the authors of the paper collectively as "Google," but the lead authors, Lingjia Tang and Jason Mars, are professors at UC San Diego. Of course, they must have collaborated with Google and they may have done the work while doing summer internships in 2011 (CVs are at <a href="http://www.lingjia.org/" rel="nofollow">http://www.lingjia.org/</a> and <a href="http://jasonmars.org/" rel="nofollow">http://jasonmars.org/</a>).

评论 #5795115 未加载

评论 #5797000 未加载

评论 #5797550 未加载

mtdewcmu将近 12 年前

I'm having a little trouble making sense of this:"For example, bigtable beneﬁts from cache sharing and would prefer 100 % remote accesses to 50% remote. Search-frontend prefers spreading the threads to multiple caches to reduce cache contention and thus also prefers 100 % remote accesses to 50% remote."Let me see if I've got this straight:* bigtable benefits from scheduling related threads on the same cpu so they can share a cache, I'm guessing because multiple threads work on the same data simultaneously* search benefits from having its threads spread over many cpus, probably because the threads are unrelated to each other and not sharing data, so they like to have their own cachesI'm not sure I understand how this relates to NUMA, or why remote accesses are ever a good thing. Maybe it requires a more sophisticated understanding of computer architecture than what I have.

评论 #5795614 未加载

评论 #5795502 未加载

rpearl将近 12 年前

Given Google's ability to obtain processors before they are available to the public, and given that this paper refers to AMD's Barcelona processors, the results published here are probably approximately seven years out-of-date, and it's not clear whether they're still relevant now.

mckilljoy将近 12 年前

I like reading these analyses, although I'm afraid headlines like this oversimplify things and give off the wrong impression. There isn't anything inherently wrong with NUMA, it just isn't useful in this situation.No technology is a 'silver bullet'. Every workload has a different set of considerations that require a different set of technology to optimize.

评论 #5796476 未加载

chad_walters将近 12 年前

The title is not just misleading -- it is just plain wrong.NUMA was 15% better for Gmail and 20% better for the Web search frontends, as indicated by the reductions (improvements) in CPI for these workloads.There were some workloads where NUMA did degrade performance, such as BigTable accesses (12% regression).

lallysingh将近 12 年前

Specifically: "in multicore multisocket machines, there is often a tradeoff between optimizing NUMA performance by clustering threads close to the memory nodes to increase the amount of local accesses and optimizing for cache performance by spreading threads to reduce the cache contention"I.e. the performance benefit from socket-local memory accesses may not be worth having all the threads using that memory on that socket's CPUs, because they'll each get too little a share of the cache.

hollerith将近 12 年前

Up to 20% slower than what?(Than SMP systems, I guess, but the OP does not say.)

评论 #5795118 未加载

7 条评论

scott_s将近 12 年前

评论 #5795115 未加载

评论 #5797000 未加载

评论 #5797550 未加载

mtdewcmu将近 12 年前

评论 #5795614 未加载

评论 #5795502 未加载

rpearl将近 12 年前

mckilljoy将近 12 年前

评论 #5796476 未加载

chad_walters将近 12 年前

lallysingh将近 12 年前

hollerith将近 12 年前

Up to 20% slower than what?(Than SMP systems, I guess, but the OP does not say.)

评论 #5795118 未加载

Google Finds NUMA Up to 20% Slower for Gmail and Websearch

7 条评论

Google Finds NUMA Up to 20% Slower for Gmail and Websearch

7 条评论