TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Spinning Up a Spark Cluster on Spot Instances: Step by Step

37 点作者 ddrum001超过 9 年前

5 条评论

eranation超过 9 年前
Very nice. I prefer using the built in command line ec2 scripts packages with all Spark versions.<p><a href="https:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ec2-scripts.html" rel="nofollow">https:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ec2-scripts.html</a><p>You can even specify the spot instance price, instance type, number of nodes, etc.<p>e.g.<p>.&#x2F;spark-ec2 --key-pair=awskey --identity-file=awskey.pem --region=us-west-1 --zone=us-west-1a --spot-price=0.2 --instance-type=m4.4xlarge launch my-spark-cluster
评论 #10406236 未加载
boulos超过 9 年前
If you don&#x27;t care to do all the setup yourself, we&#x27;ve recently announced Dataproc as a fully-managed service including support for Preemptible VMs: <a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;dataproc&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;dataproc&#x2F;</a> .<p>Disclaimer: I work on Compute Engine, specifically Preemptible VMs, but didn&#x27;t work on Dataproc (though I did add --preemptible to bdutil!)
评论 #10406418 未加载
评论 #10407626 未加载
stuartaxelowen超过 9 年前
There&#x27;s a script for that, actually:<p><a href="http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ec2-scripts.html" rel="nofollow">http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ec2-scripts.html</a>
technofiend超过 9 年前
Man, wish I could use spark at work but it uses a mavenized build that requires open internet access for dependency download. Not worth it when I have to register every dependent jar file by hand in my internal repository.
评论 #10407308 未加载
angryasian超过 9 年前
Why wouldn&#x27;t you just use EMR with yarn and spark 1.5 ?