I do the same on Google Compute Engine, except without the auto-terminate and scaling :(<p>However, Google's bdutil has a great set of shell scripts which auto setup the environment; and, with minimal changes you can set up the exact Scala/Spark versions you need.<p>The fact that I (just one dude) can set up a pipeline and chomp through TBs of data on clusters with TBs of memory over the course of hours still keeps me in awe of the advances of both GCE and AWS.<p>I'll have to give EMR/AWS a shot!