TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Learning Hadoop

2 pointsby boniface316over 8 years ago
I have started to take some basic level data science courses on edx.org. Then I came across Hadoop and I would really love to learn this. I have the following questions and I would really appreciate if you can help me with this:<p>1. What is the best source to start learning Hadoop? I was thinking of starting with Udacity or Big data university.<p>2. Do I need Linux to run Hadoop? I am having wifi issues even after I did the driver upgrade.<p>3. In order to be employed, do I need to learn the entire system or just one portion of it like spark, hive or pig?<p>Please advise.

4 comments

brudgersover 8 years ago
Caveat: This is random advice from the internet.<p>1. If it were me, I&#x27;d start by installing Hadoop on a laptop since Googling indicates it&#x27;s doable....for some definition of &#x27;doable.&#x27; Even if I could not get it to work, reading the documentation and researching whatever problems I encountered would deepen my practical knowledge. Getting Hadoop up and running is also a facet in a practical working definition of &#x27;knowing Hadoop.&#x27;<p>2. Linux Wireless driver BLOB&#x27;s have been a source of pain for me. The work arounds for me have been:<p>a. Purchase well supported hardware, e.g. used Thinkpad and cards without obscure Broadcom chips.<p>b. Use an external wireless router and an ethernet cable. That&#x27;s how I connect desktops and laptops around the office.<p>3. My gut is that the important knowledge for many positions requiring or preferring Hadoop will be more related to data science than technical expertise. On the other hand, looping back to my earlier advice, positions that are Hadoop first rather than data-science first will benefit from an operational understanding.<p>Lastly, what I&#x27;ve been hearing about the industry, is that &#x27;embarrassingly parallel workloads that can take full advantage of Hadoop are not as common as was thought a few years ago. The big useful innovation of Hadoop is looking like the underlying Hadoop Distributed File System (HDFS) and other big data search&#x2F;query tools are being built over it.<p>That&#x27;s not to say Hadoop is dead or not worth exploring, particularly at the technical level of HDFS and in terms of applying data-science concepts. Learning Pig or Hive makes sense in service of learning how to apply data science concepts. Because Hive is based on SQL it is probably the more generalizable skill...and learning SQL is probably more useful than learning either in terms of employment.<p>Good luck.
评论 #12610165 未加载
praneshpover 8 years ago
I learned Hadoop in Grad school in 2013. If you can spend a little bit of cash, get some VMs on AWS, and follow one of the many guides out there (for example, Cloudera) to install Hadoop. Should be enough to build something like: <a href="http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;2012&#x2F;09&#x2F;analyzing-twitter-data-with-hadoop&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;2012&#x2F;09&#x2F;analyzing-twitter-data...</a>.<p>I started out trying my VMs on virtualbox, then a couple of different laptops at home, etc, but AWS was the easiest setup in the end.
评论 #12610170 未加载
mtmailover 8 years ago
There&#x27;s also <a href="http:&#x2F;&#x2F;www.cloudera.com&#x2F;training.html" rel="nofollow">http:&#x2F;&#x2F;www.cloudera.com&#x2F;training.html</a><p>You can run Linux in a virtual machine (VirtualBox, VMware etc) where you wouldn&#x27;t have to deal with wifi drivers because it uses the existing network connection from the host operating system.
评论 #12597327 未加载
mtmailover 8 years ago
There&#x27;s a couple of hints to books in <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12389595" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12389595</a>
评论 #12610172 未加载