I'm pretty sure this was a homework assignment in the recent Scalable Machine Learning eDx course. :P<p>I've been looking into getting data for analyzing movie data via Spark/MLib, but the IMDb database is too unwieldy.
AMPCamp 6 has been announced for Nov 19 and 20: <a href="http://ampcamp.berkeley.edu/6/" rel="nofollow">http://ampcamp.berkeley.edu/6/</a>