Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley's AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx: <a href="https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x" rel="nofollow">https://www.edx.org/course/introduction-big-data-apache-spar...</a><p>(If you wonder what is Spark, in a very unofficial nutshell - it is a computation / big data / analytics / machine learning / graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)<p>It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student / teacher message board).<p>So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6/12 EOD, but you have 3 days "grace" period... and there is not too much to catch up)<p>I use Spark for work (Scala API) and still learned one or two new things.<p>It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.<p>It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley & Databricks.<p><a href="https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x" rel="nofollow">https://www.edx.org/course/scalable-machine-learning-uc-berk...</a><p>(not affiliated with edx, Berkeley or databricks, just thought it's a good place for a PSA to those interested)<p>The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM (<a href="http://www.acm.org/press-room/news-releases/2015/dissertation-award-14/" rel="nofollow">http://www.acm.org/press-room/news-releases/2015/dissertatio...</a>)<p>Spark also set a new record in large scale sorting (Beating Hadoop by far): <a href="https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html" rel="nofollow">https://databricks.com/blog/2014/11/05/spark-officially-sets...</a><p>* EDIT: typo in "Berkeley", thanks gboss for noticing :)