TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Setting a new world record in CloudSort with Apache Spark

86 点作者 rxin超过 8 年前

4 条评论

devonkim超过 8 年前
It&#x27;s not clear <i>how much</i> of these improvements are from reductions in pricing rather than algorithms and design decisions. They&#x27;ve documented things like using Netty for network latency, avoiding GC, and getting better with Spark, but it&#x27;d be interesting if the team could go back and run the benchmark using the same infrastructure as their 2014 benchmark for a code-vs-code comparison to separate engineering improvements from economies of scale.
评论 #12961198 未加载
评论 #12963427 未加载
embiggen超过 8 年前
Meanwhile, google was sorting Petabytes in under a minute on their clusters 6+ years ago. We&#x27;ve still got a long ways to go in OSS land to compete with the big boys.
评论 #12963497 未加载
flukus超过 8 年前
A price record not a performance one.<p>Also, seeing how expensive it is to sort 100TB ($144) you have to wonder why it wouldn&#x27;t be better to do it on your own hardware.
评论 #12963783 未加载
iaw超过 8 年前
I got excited and then I saw that this was for sorting not storage...
评论 #12960839 未加载