TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What are the alternatives of hosting Apache Spark?

28 点作者 muramira大约 7 年前
I truly love what databricks is doing, but their pricing model is unpredictable. Are there any other hosting companies that provide a fixed price?

12 条评论

SmirkingRevenge大约 7 年前
If your spark jobs are mostly batch workloads, that can tolerate moderately infrequent failures and restarts, try using google dataproc with preemptible vms or amazon emr using spot instances.<p>Depending on your use case, you might spend many times less than you would using regular VMs. Many instances that are several dollars an hour on AWS can be used for a fraction of the price.<p>Its also fairly easy to automate the region selection and bid (on AWS that is, not sure about gcloud).<p>If you need streaming, obviously this might not be the way to go.
评论 #16910029 未加载
perlin大约 7 年前
Rewrite all of your jobs using Apache Beam. Then use whatever runner you want: Spark, Flink, Google Cloud Dataflow, etc.
sandGorgon大约 7 年前
Google Dataproc - very good and very soon they will release kubernetes as the manager instead of yarn.
评论 #16914638 未加载
Zaheer大约 7 年前
Check out AWS Glue: <a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;glue&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;glue&#x2F;</a><p>Disclosure: I work on this service
评论 #16909898 未加载
tejasmanohar大约 7 年前
All 3 major cloud providers have offerings in this space. Amazon [0], Google [1], Microsoft [2].<p>[0]: <a href="https:&#x2F;&#x2F;aws.amazon.com&#x2F;emr&#x2F;" rel="nofollow">https:&#x2F;&#x2F;aws.amazon.com&#x2F;emr&#x2F;</a><p>[1]: <a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;dataproc&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;dataproc&#x2F;</a><p>[2]: <a href="https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;services&#x2F;databricks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;services&#x2F;databricks&#x2F;</a>
Sevii大约 7 年前
You could give AWS EMR a shot, it probably doesn&#x27;t offer as much as databricks but should have consistent pricing.
antoncohen大约 7 年前
Run Spark on a managed Kubernetes like GKE? There is experimental support for using Kubernetes as the cluster manager.<p><a href="https:&#x2F;&#x2F;apache-spark-on-k8s.github.io&#x2F;userdocs&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;apache-spark-on-k8s.github.io&#x2F;userdocs&#x2F;index.html</a>
hiyer大约 7 年前
You can try Qubole [0]. The pricing is a small percentage of what you pay to the cloud provider, so it&#x27;s predictable to an extent.<p>[0]: <a href="https:&#x2F;&#x2F;www.qubole.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.qubole.com&#x2F;</a><p>Disclosure: I work here.
tspann大约 7 年前
<a href="https:&#x2F;&#x2F;hortonworks.com&#x2F;products&#x2F;data-platforms&#x2F;cloud&#x2F;aws&#x2F;" rel="nofollow">https:&#x2F;&#x2F;hortonworks.com&#x2F;products&#x2F;data-platforms&#x2F;cloud&#x2F;aws&#x2F;</a>
scarecrowx大约 7 年前
We&#x27;re using Spark on EMR with Data Pipeline to do ETL and to run Scheduled Jobs. Data pipelines terminates the cluster once ETL or job gets completed, helps us a lot to save cost.
shelzzzzz大约 7 年前
what part of it is unpredictable? I guess if you know how much VMs or EC2 you&#x27;re planning on using its the same pricing model as Dataproc or EMR
curiousDog大约 7 年前
Check out Azure Databricks