TechEcho

12 comments

If your spark jobs are mostly batch workloads, that can tolerate moderately infrequent failures and restarts, try using google dataproc with preemptible vms or amazon emr using spot instances.Depending on your use case, you might spend many times less than you would using regular VMs. Many instances that are several dollars an hour on AWS can be used for a fraction of the price.Its also fairly easy to automate the region selection and bid (on AWS that is, not sure about gcloud).If you need streaming, obviously this might not be the way to go.

评论 #16910029 未加载

perlinabout 7 years ago

Rewrite all of your jobs using Apache Beam. Then use whatever runner you want: Spark, Flink, Google Cloud Dataflow, etc.

sandGorgonabout 7 years ago

Google Dataproc - very good and very soon they will release kubernetes as the manager instead of yarn.

评论 #16914638 未加载

Zaheerabout 7 years ago

Check out AWS Glue: <a href="https://aws.amazon.com/glue/" rel="nofollow">https://aws.amazon.com/glue/</a>Disclosure: I work on this service

评论 #16909898 未加载

tejasmanoharabout 7 years ago

All 3 major cloud providers have offerings in this space. Amazon [0], Google [1], Microsoft [2].[0]: <a href="https://aws.amazon.com/emr/" rel="nofollow">https://aws.amazon.com/emr/</a>[1]: <a href="https://cloud.google.com/dataproc/" rel="nofollow">https://cloud.google.com/dataproc/</a>[2]: <a href="https://azure.microsoft.com/en-us/services/databricks/" rel="nofollow">https://azure.microsoft.com/en-us/services/databricks/</a>

Seviiabout 7 years ago

You could give AWS EMR a shot, it probably doesn't offer as much as databricks but should have consistent pricing.

antoncohenabout 7 years ago

Run Spark on a managed Kubernetes like GKE? There is experimental support for using Kubernetes as the cluster manager.<a href="https://apache-spark-on-k8s.github.io/userdocs/index.html" rel="nofollow">https://apache-spark-on-k8s.github.io/userdocs/index.html</a>

hiyerabout 7 years ago

You can try Qubole [0]. The pricing is a small percentage of what you pay to the cloud provider, so it's predictable to an extent.[0]: <a href="https://www.qubole.com/" rel="nofollow">https://www.qubole.com/</a>Disclosure: I work here.

tspannabout 7 years ago

<a href="https://hortonworks.com/products/data-platforms/cloud/aws/" rel="nofollow">https://hortonworks.com/products/data-platforms/cloud/aws/</a>

scarecrowxabout 7 years ago

We're using Spark on EMR with Data Pipeline to do ETL and to run Scheduled Jobs. Data pipelines terminates the cluster once ETL or job gets completed, helps us a lot to save cost.

shelzzzzzabout 7 years ago

what part of it is unpredictable? I guess if you know how much VMs or EC2 you're planning on using its the same pricing model as Dataproc or EMR

curiousDogabout 7 years ago

Check out Azure Databricks

12 comments

SmirkingRevengeabout 7 years ago

评论 #16910029 未加载

perlinabout 7 years ago

Rewrite all of your jobs using Apache Beam. Then use whatever runner you want: Spark, Flink, Google Cloud Dataflow, etc.

sandGorgonabout 7 years ago

Google Dataproc - very good and very soon they will release kubernetes as the manager instead of yarn.

评论 #16914638 未加载

Zaheerabout 7 years ago

Check out AWS Glue: <a href="https://aws.amazon.com/glue/" rel="nofollow">https://aws.amazon.com/glue/</a>Disclosure: I work on this service

评论 #16909898 未加载

tejasmanoharabout 7 years ago

Seviiabout 7 years ago

You could give AWS EMR a shot, it probably doesn't offer as much as databricks but should have consistent pricing.

antoncohenabout 7 years ago

hiyerabout 7 years ago

tspannabout 7 years ago

<a href="https://hortonworks.com/products/data-platforms/cloud/aws/" rel="nofollow">https://hortonworks.com/products/data-platforms/cloud/aws/</a>

scarecrowxabout 7 years ago

We're using Spark on EMR with Data Pipeline to do ETL and to run Scheduled Jobs. Data pipelines terminates the cluster once ETL or job gets completed, helps us a lot to save cost.

shelzzzzzabout 7 years ago

what part of it is unpredictable? I guess if you know how much VMs or EC2 you're planning on using its the same pricing model as Dataproc or EMR

curiousDogabout 7 years ago

Check out Azure Databricks

Ask HN: What are the alternatives of hosting Apache Spark?

12 comments

Ask HN: What are the alternatives of hosting Apache Spark?

12 comments