科技回声

I frequently work with some 'data science' projects that go from some MB to hundreds of GB.Ideally I'd have a cloud terminal I'd connect to which could scale its RAM to fit my process RAM usage (and possibly scale up CPUs transparently too).I know that you can scale up various cloud instances, but managing the runtime state is a problem. I'd like to avoid ever having to kill whatever processes I have running.Something like Google's Live Migration would also be a good match here, if it enabled migrating to a bigger machine type without rebooting, or without otherwise losing process state.Ideally I'm looking for something that I could transparently scale up and down, and which I could always SSH into without having to manually start/shutdown the instances.Bonus points if GPUs could be added/removed in the same manner.

1 comment

tgdn超过 3 年前

Have you looked into Spark? There are managed Spark options on AWS/GCP (for example Databricks). Spark lets you do exactly what you are saying.Define minimum/maximum number of nodes, the machine capacity (RAM/CPU) and let Spark handle the scaling for you.It gives you a Jupyter-like runtime to work on possibly massive datasets. Spark is perhaps too much for what you're looking for. Kubernetes could possibly be used with Airflow/DBT possibly, for example for ETL/ELT pipelines.

评论 #28855011 未加载

Ask HN: Does a dynamically scaling cloud workstation exist somewhere?

1 comment

Ask HN: Does a dynamically scaling cloud workstation exist somewhere?

1 comment