科技回声

I do the same on Google Compute Engine, except without the auto-terminate and scaling :(<p>However, Google's bdutil has a great set of shell scripts which auto setup the environment; and, with minimal changes you can set up the exact Scala/Spark versions you need.<p>The fact that I (just one dude) can set up a pipeline and chomp through TBs of data on clusters with TBs of memory over the course of hours still keeps me in awe of the advances of both GCE and AWS.<p>I'll have to give EMR/AWS a shot!

Jeff here. Glad that people are interested in this post. Feel free to ping me with any questions.

Large-Scale Machine Learning with Spark on Amazon EMR

2 条评论

Large-Scale Machine Learning with Spark on Amazon EMR

2 条评论