TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

[Ask HN] From days to minutes by using Dask and Spark–what else should I learn?

1 点作者 george_ciobanu将近 4 年前
I often find myself having to wait up to multiple hours (in one case days) waiting for an ML model to train or to tune some hyper parameters. I recently discovered Dask and with small changes to my Python code I saved many hours of compute. I only wish I knew about it sooner.<p>Does anyone else spend hours waiting for their computation to run? If yes, I&#x27;d love to<p>Know which tasks are the most time-consuming for you (in terms of waiting for CPU to finish)<p>Learn how you deal with large datasets when doing distributed computing<p>Feel free to comment below or message me if you&#x27;d like to share tips, links etc. I feel like a noob for not knowing about it until now and am afraid I might be missing other important use cases &#x2F; tools etc. Any and all feedback is appreciated!

1 comment

propelled将近 4 年前
The Julia programming language would help speed up computation.<p><a href="https:&#x2F;&#x2F;julialang.org&#x2F;benchmarks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;julialang.org&#x2F;benchmarks&#x2F;</a><p>You can use Julia with Apache Spark and Julia works with Python via PyCall. If you are working with tabular data the Julia SparkSQL.jl package lets you create Spark apps using just Julia and SQL:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;propelledanalytics&#x2F;SparkSQL.jl" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;propelledanalytics&#x2F;SparkSQL.jl</a><p>Tutorials:<p><a href="https:&#x2F;&#x2F;propelledanalytics.github.io&#x2F;Tutorials&#x2F;" rel="nofollow">https:&#x2F;&#x2F;propelledanalytics.github.io&#x2F;Tutorials&#x2F;</a>
评论 #27612006 未加载