科技回声

10 条评论

marshray大约 12 年前

> a single “big memory” (192 GB) server we are using has the performance capability of approximately 14 standard (12 GB) servers.That's 168 GB RAM total, within a single upgrade unit of their 192 GB single server, suggesting the problem is dominated by RAM.If I had $6638 + 2640 = $9278 to spend on computing hardware from NewEgg, how about:<pre><code> 1 at $1000 of HP ProLiant DL360e Gen8 Rack Server System Intel Xeon E5-2403 1.8GHz 4C/4T 4GB http://www.newegg.com/Product/Product.aspx?Item=N82E16859107943 http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-5249570.html?dnr=1 (12 DIMM slots) 4 at $70 of Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1333 Server Memory Model KVR13LR9S4/8 http://www.newegg.com/Product/Product.aspx?Item=N82E16820239540 1 at $54 of Seagate Barracuda ST250DM000 250GB 7200 RPM 16MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drive http://www.newegg.com/Product/Product.aspx?Item=N82E16822148765 $1334 ea server, 7 servers = $9338 </code></pre> So we could get 7 of these low-end name-brand 16 GB servers for the same money to give us 224 GB RAM.> MR++ runs on 27 servers whereas the standalone conﬁgurations are a single server running a single-threaded implementationSure, nothing will beat a single system at message-passing algorithms when the entire graph fits in main memory. But when the dataset outgrows that (and it will), we can triple the RAM in the empty slots, or add more servers in units of $1334 instead of having to rewrite your whole analysis.

评论 #5603665 未加载

评论 #5603490 未加载

gruseom大约 12 年前

Off-topic, but since younger HNers may not know what the title is riffing on, <a href="https://www.google.com/search?q=nobody%20got%20fired%20for%20buying%20ibm" rel="nofollow">https://www.google.com/search?q=nobody%20got%20fired%20for%2...</a>

评论 #5603404 未加载

评论 #5603833 未加载

samspenc大约 12 年前

Nice try, but a few thoughts from someone who now spends a lot of time in Hadoop/MapReduce. I will admit it took me a while to warm up to the whole concept, but I'm now so familiar with it that I'm not able to think of compute before Hadoop.I'm able to fire off pretty intensive MapReduce jobs on an Amazon Elastic MapReduce cluster with many nodes for a fraction of the price mentioned in the post (less than $100).While I can imagine I could repurpose all my MapReduce/Hadoop code to run on a single box - especially since Amazon does offer several high-memory instances today - I would be loathe to.The MapReduce framework provides a really nice framework that lets me horizontally scale out compute, rather than vertically, and that is really handy at terabyte-data volumes (data warehousing and large-data analytics.)

评论 #5603240 未加载

评论 #5603180 未加载

评论 #5603262 未加载

评论 #5604416 未加载

zdw大约 12 年前

totally offtopic typographic comment - anyone else seeing the alt-font ﬁ ligature in the title? It's quasi-bolded on my machined (Safari, OS X), so it sticks out like a sore thumb.

评论 #5603869 未加载

评论 #5604477 未加载

评论 #5604342 未加载

评论 #5604293 未加载

评论 #5603706 未加载

评论 #5603931 未加载

评论 #5603676 未加载

marshray大约 12 年前

"At 100 GB, scale-up still provides the best performance/$, but the 16-node cluster is close at 88% of the performance/$ of scale-up."If these message-passing graph algorithms are representative of a "bad fit" to the parallel map-reduce model, I'd say a 12% penalty is not a bad price to pay at all in return for all the benefits of the parallel cluster in other cases.

saosebastiao大约 12 年前

I've done a handful of big-memory workloads on the JVM, and I have never seen it not choke on Allocation/GC for anything above 300gb of memory. Does this paper address this limitation?

评论 #5604399 未加载

评论 #5604383 未加载

sauravc大约 12 年前

Reminds of this article: <a href="http://blog.wavii.com/2011/12/29/your-mileage-may-vary/" rel="nofollow">http://blog.wavii.com/2011/12/29/your-mileage-may-vary/</a>

评论 #5603948 未加载

rbanffy大约 12 年前

Having just recovered from a 72+ hour outage (about a dozen Hyper-V based hosts on our hosting provider) caused by a single (large) machine attached to a single (large) storage appliance, I think I'll pass on this idea of just scaling up instead of scaling horizontally.edit: "pass on" (thanks, marshray)

评论 #5603973 未加载

评论 #5604587 未加载

jamesaguilar大约 12 年前

I'm surprised they didn't select any tasks that couldn't easily fit on a single machine (largest input set was <200GB). That said, if the scale up performance can be improved without compromising scale out, why not?

Theory5大约 12 年前

"... There IS the occasional savage beating, and more than their fair share of suicides. But that has "statistical clustering" all over it. "

10 条评论

marshray大约 12 年前

评论 #5603665 未加载

评论 #5603490 未加载

gruseom大约 12 年前

评论 #5603404 未加载

评论 #5603833 未加载

samspenc大约 12 年前

评论 #5603240 未加载

评论 #5603180 未加载

评论 #5603262 未加载

评论 #5604416 未加载

zdw大约 12 年前

totally offtopic typographic comment - anyone else seeing the alt-font ﬁ ligature in the title? It's quasi-bolded on my machined (Safari, OS X), so it sticks out like a sore thumb.

评论 #5603869 未加载

评论 #5604477 未加载

评论 #5604342 未加载

评论 #5604293 未加载

评论 #5603706 未加载

评论 #5603931 未加载

评论 #5603676 未加载

marshray大约 12 年前

saosebastiao大约 12 年前

I've done a handful of big-memory workloads on the JVM, and I have never seen it not choke on Allocation/GC for anything above 300gb of memory. Does this paper address this limitation?

评论 #5604399 未加载

评论 #5604383 未加载

sauravc大约 12 年前

Reminds of this article: <a href="http://blog.wavii.com/2011/12/29/your-mileage-may-vary/" rel="nofollow">http://blog.wavii.com/2011/12/29/your-mileage-may-vary/</a>

评论 #5603948 未加载

rbanffy大约 12 年前

评论 #5603973 未加载

评论 #5604587 未加载

jamesaguilar大约 12 年前

Theory5大约 12 年前

"... There IS the occasional savage beating, and more than their fair share of suicides. But that has "statistical clustering" all over it. "

Nobody ever got ﬁred for buying a cluster

10 条评论

Nobody ever got ﬁred for buying a cluster

10 条评论