TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Large Scale Distributed Deep Learning on Hadoop Clusters

42 pointsby cjdulbergerover 9 years ago

1 comment

dugganover 9 years ago
Both this and Twitter Engineering&#x27;s recent post[1] on HDFS make me wonder whether HDFS is something a team would reach for in 2015.<p>I&#x27;m starting to read into the technologies in this area (i.e., I have not used much of the Hadoop stack yet), and I haven&#x27;t found a fundamental reason why one would not base their batch processing on S3 (or your object store of choice). Existing software appears to make assumptions about the storage medium being a local hard drive.<p>Much of the challenge of HDFS appears to be around scaling the NameNode, and provisioning capacity. S3 dispenses with these issues, and the only cost appears to be throughput.<p>If software like Spark was modified to have a much more native approach to S3, could HDFS be dispensed with entirely?<p>[1] <a href="https:&#x2F;&#x2F;blog.twitter.com&#x2F;2015&#x2F;hadoop-filesystem-at-twitter" rel="nofollow">https:&#x2F;&#x2F;blog.twitter.com&#x2F;2015&#x2F;hadoop-filesystem-at-twitter</a>
评论 #10420892 未加载
评论 #10420107 未加载
评论 #10420496 未加载
评论 #10420032 未加载
评论 #10420215 未加载
评论 #10420100 未加载