I am trying to learn Hadoop and was wondering if there are any references, tutorials, or papers that HNers use that I could make use of and make learning Hadoop more simpler, efficient, and productive.
So learning Hadoop can be split up into several pieces:<p>1. Learning the idea of Map-Reduce. This is fairly easy and you could browse through the original research paper and figure that out.<p>2. Learning the weird, wild animal called Hadoop (with its multiple API clusterfuck). This is going to be much harder. Presuming you know Java, the first thing you want to do is get a Cloudera VM (because you don't really want to spend time learning how to install hadoop at first) and start figuring out how to build Word Count inside the VM. This should give you some insight (not much though) in how the API works.<p>3. Figure out more complicated stuff you want to do with Hadoop and start working on it. Get a copy of Tom White's Hadoop book (From what I remember six months back, the API was hopelessly outdated but the ideas are awesome) and Jimmy Lin's book on text processing with Map Reduce ( <a href="http://lintool.github.com/MapReduceAlgorithms/" rel="nofollow">http://lintool.github.com/MapReduceAlgorithms/</a> ). Personally, I loved Jimmy's book not because of the machine learning content but because of the design patterns for Hadoop that he had embedded in there.