Both this and Twitter Engineering's recent post[1] on HDFS make me wonder whether HDFS is something a team would reach for in 2015.<p>I'm starting to read into the technologies in this area (i.e., I have not used much of the Hadoop stack yet), and I haven't found a fundamental reason why one would not base their batch processing on S3 (or your object store of choice). Existing software appears to make assumptions about the storage medium being a local hard drive.<p>Much of the challenge of HDFS appears to be around scaling the NameNode, and provisioning capacity. S3 dispenses with these issues, and the only cost appears to be throughput.<p>If software like Spark was modified to have a much more native approach to S3, could HDFS be dispensed with entirely?<p>[1] <a href="https://blog.twitter.com/2015/hadoop-filesystem-at-twitter" rel="nofollow">https://blog.twitter.com/2015/hadoop-filesystem-at-twitter</a>