TechEcho

I often find myself having to wait up to multiple hours (in one case days) waiting for an ML model to train or to tune some hyper parameters. I recently discovered Dask and with small changes to my Python code I saved many hours of compute. I only wish I knew about it sooner.Does anyone else spend hours waiting for their computation to run? If yes, I'd love toKnow which tasks are the most time-consuming for you (in terms of waiting for CPU to finish)Learn how you deal with large datasets when doing distributed computingFeel free to comment below or message me if you'd like to share tips, links etc. I feel like a noob for not knowing about it until now and am afraid I might be missing other important use cases / tools etc. Any and all feedback is appreciated!

1 comment

propelledalmost 4 years ago

The Julia programming language would help speed up computation.<a href="https://julialang.org/benchmarks/" rel="nofollow">https://julialang.org/benchmarks/</a>You can use Julia with Apache Spark and Julia works with Python via PyCall. If you are working with tabular data the Julia SparkSQL.jl package lets you create Spark apps using just Julia and SQL:<a href="https://github.com/propelledanalytics/SparkSQL.jl" rel="nofollow">https://github.com/propelledanalytics/SparkSQL.jl</a>Tutorials:<a href="https://propelledanalytics.github.io/Tutorials/" rel="nofollow">https://propelledanalytics.github.io/Tutorials/</a>

评论 #27612006 未加载

[Ask HN] From days to minutes by using Dask and Spark–what else should I learn?

1 comment

[Ask HN] From days to minutes by using Dask and Spark–what else should I learn?

1 comment