TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Setting a new world record in CloudSort with Apache Spark

86 pointsby rxinover 8 years ago

4 comments

devonkimover 8 years ago
It&#x27;s not clear <i>how much</i> of these improvements are from reductions in pricing rather than algorithms and design decisions. They&#x27;ve documented things like using Netty for network latency, avoiding GC, and getting better with Spark, but it&#x27;d be interesting if the team could go back and run the benchmark using the same infrastructure as their 2014 benchmark for a code-vs-code comparison to separate engineering improvements from economies of scale.
评论 #12961198 未加载
评论 #12963427 未加载
embiggenover 8 years ago
Meanwhile, google was sorting Petabytes in under a minute on their clusters 6+ years ago. We&#x27;ve still got a long ways to go in OSS land to compete with the big boys.
评论 #12963497 未加载
flukusover 8 years ago
A price record not a performance one.<p>Also, seeing how expensive it is to sort 100TB ($144) you have to wonder why it wouldn&#x27;t be better to do it on your own hardware.
评论 #12963783 未加载
iawover 8 years ago
I got excited and then I saw that this was for sorting not storage...
评论 #12960839 未加载