TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Hadoop sorts a petabyte in 16.25 hours and a terabyte in 62 seconds

36 pointsby voberoiabout 16 years ago

4 comments

ariwilsonabout 16 years ago
Google did the terabyte slightly slower (68 seconds) on 4x fewer machines, but did the petabyte in 6 hours and 2 minutes (around 1/3 of the time of Hadoop) on nearly the same number of machines (4000).
评论 #604738 未加载
评论 #604677 未加载
tlrobinsonabout 16 years ago
I think you mean "3800 machines running Hadoop"
voberoiabout 16 years ago
Their report (linked to from the post) goes into greater detail: <a href="http://developer.yahoo.com/blogs/hadoop/Yahoo2009.pdf" rel="nofollow">http://developer.yahoo.com/blogs/hadoop/Yahoo2009.pdf</a><p>I'd love to know why the 500 GB and 100 TB sorts ran at about half the speed of the other two (~0.5 TB/min as opposed to ~1 TB/min).
评论 #605391 未加载
bayareaguyabout 16 years ago
A <i>minute</i> to sort 1TB on a system with 11TB of ram?
评论 #604909 未加载