TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

[Ask HN] From days to minutes by using Dask and Spark–what else should I learn?

1 pointsby george_ciobanualmost 4 years ago
I often find myself having to wait up to multiple hours (in one case days) waiting for an ML model to train or to tune some hyper parameters. I recently discovered Dask and with small changes to my Python code I saved many hours of compute. I only wish I knew about it sooner.<p>Does anyone else spend hours waiting for their computation to run? If yes, I&#x27;d love to<p>Know which tasks are the most time-consuming for you (in terms of waiting for CPU to finish)<p>Learn how you deal with large datasets when doing distributed computing<p>Feel free to comment below or message me if you&#x27;d like to share tips, links etc. I feel like a noob for not knowing about it until now and am afraid I might be missing other important use cases &#x2F; tools etc. Any and all feedback is appreciated!

1 comment

propelledalmost 4 years ago
The Julia programming language would help speed up computation.<p><a href="https:&#x2F;&#x2F;julialang.org&#x2F;benchmarks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;julialang.org&#x2F;benchmarks&#x2F;</a><p>You can use Julia with Apache Spark and Julia works with Python via PyCall. If you are working with tabular data the Julia SparkSQL.jl package lets you create Spark apps using just Julia and SQL:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;propelledanalytics&#x2F;SparkSQL.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;propelledanalytics&#x2F;SparkSQL.jl</a><p>Tutorials:<p><a href="https:&#x2F;&#x2F;propelledanalytics.github.io&#x2F;Tutorials&#x2F;" rel="nofollow">https:&#x2F;&#x2F;propelledanalytics.github.io&#x2F;Tutorials&#x2F;</a>
评论 #27612006 未加载