Hi,<p>I picked up a Hadoop couple of months back and am finding different ways to use it so as to apply what I have learned so far (MapReduce, Hive, Pig, etc).<p>As Hadoop is really used in environments where the data to be queried is large, I started looking around for such kind of data. I came across the Wikipedia data (available for download).<p>Now I am trying to list out the questions that I could as this data.<p>What are the questions that you want answered from the data available in the Wikipedia data?<p>This will help me write some useful MapReduce code , Hive quries or Pig scripts to improve my skills.<p>I just feel that learning by doing is the best form of learning.<p>Thanks.
You could replicate <a href="http://en.wikipedia.org/wiki/Most_common_words_in_English" rel="nofollow">http://en.wikipedia.org/wiki/Most_common_words_in_English</a> using Wikipedia as your Corpus