TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Nobody ever got fired for buying a cluster

72 点作者 jpmc大约 12 年前

10 条评论

marshray大约 12 年前
&#62; a single “big memory” (192 GB) server we are using has the performance capability of approximately 14 standard (12 GB) servers.<p>That's 168 GB RAM total, within a single upgrade unit of their 192 GB single server, suggesting the problem is dominated by RAM.<p>If I had $6638 + 2640 = $9278 to spend on computing hardware from NewEgg, how about:<p><pre><code> 1 at $1000 of HP ProLiant DL360e Gen8 Rack Server System Intel Xeon E5-2403 1.8GHz 4C/4T 4GB http://www.newegg.com/Product/Product.aspx?Item=N82E16859107943 http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/15351-15351-3328412-241644-241475-5249570.html?dnr=1 (12 DIMM slots) 4 at $70 of Kingston 8GB 240-Pin DDR3 SDRAM ECC Registered DDR3 1333 Server Memory Model KVR13LR9S4/8 http://www.newegg.com/Product/Product.aspx?Item=N82E16820239540 1 at $54 of Seagate Barracuda ST250DM000 250GB 7200 RPM 16MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drive http://www.newegg.com/Product/Product.aspx?Item=N82E16822148765 $1334 ea server, 7 servers = $9338 </code></pre> So we could get 7 of these low-end name-brand 16 GB servers for the same money to give us 224 GB RAM.<p>&#62; MR++ runs on 27 servers whereas the standalone configurations are a single server running a single-threaded implementation<p>Sure, nothing will beat a single system at message-passing algorithms when the entire graph fits in main memory. But when the dataset outgrows that (and it will), we can triple the RAM in the empty slots, or add more servers in units of $1334 instead of having to rewrite your whole analysis.
评论 #5603665 未加载
评论 #5603490 未加载
gruseom大约 12 年前
Off-topic, but since younger HNers may not know what the title is riffing on, <a href="https://www.google.com/search?q=nobody%20got%20fired%20for%20buying%20ibm" rel="nofollow">https://www.google.com/search?q=nobody%20got%20fired%20for%2...</a>
评论 #5603404 未加载
评论 #5603833 未加载
samspenc大约 12 年前
Nice try, but a few thoughts from someone who now spends a lot of time in Hadoop/MapReduce. I will admit it took me a while to warm up to the whole concept, but I'm now so familiar with it that I'm not able to think of compute before Hadoop.<p>I'm able to fire off pretty intensive MapReduce jobs on an Amazon Elastic MapReduce cluster with many nodes for a fraction of the price mentioned in the post (less than $100).<p>While I can imagine I could repurpose all my MapReduce/Hadoop code to run on a single box - especially since Amazon does offer several high-memory instances today - I would be loathe to.<p>The MapReduce framework provides a really nice framework that lets me horizontally scale out compute, rather than vertically, and that is really handy at terabyte-data volumes (data warehousing and large-data analytics.)
评论 #5603240 未加载
评论 #5603180 未加载
评论 #5603262 未加载
评论 #5604416 未加载
zdw大约 12 年前
totally offtopic typographic comment - anyone else seeing the alt-font fi ligature in the title? It's quasi-bolded on my machined (Safari, OS X), so it sticks out like a sore thumb.
评论 #5603869 未加载
评论 #5604477 未加载
评论 #5604342 未加载
评论 #5604293 未加载
评论 #5603706 未加载
评论 #5603931 未加载
评论 #5603676 未加载
marshray大约 12 年前
"At 100 GB, scale-up still provides the best performance/$, but the 16-node cluster is close at 88% of the performance/$ of scale-up."<p>If these message-passing graph algorithms are representative of a "bad fit" to the parallel map-reduce model, I'd say a 12% penalty is not a bad price to pay <i>at all</i> in return for all the benefits of the parallel cluster in other cases.
saosebastiao大约 12 年前
I've done a handful of big-memory workloads on the JVM, and I have never seen it not choke on Allocation/GC for anything above 300gb of memory. Does this paper address this limitation?
评论 #5604399 未加载
评论 #5604383 未加载
sauravc大约 12 年前
Reminds of this article: <a href="http://blog.wavii.com/2011/12/29/your-mileage-may-vary/" rel="nofollow">http://blog.wavii.com/2011/12/29/your-mileage-may-vary/</a>
评论 #5603948 未加载
rbanffy大约 12 年前
Having just recovered from a 72+ hour outage (about a dozen Hyper-V based hosts on our hosting provider) caused by a single (large) machine attached to a single (large) storage appliance, I think I'll pass on this idea of just scaling up instead of scaling horizontally.<p>edit: "pass on" (thanks, marshray)
评论 #5603973 未加载
评论 #5604587 未加载
jamesaguilar大约 12 年前
I'm surprised they didn't select any tasks that couldn't easily fit on a single machine (largest input set was &#60;200GB). That said, if the scale up performance can be improved without compromising scale out, why not?
Theory5大约 12 年前
"... There IS the occasional savage beating, and more than their fair share of suicides. But that has "statistical clustering" all over it. "