TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A PyTorch Approach to ML Infrastructure

113 点作者 donnygreenberg将近 2 年前

9 条评论

extr将近 2 年前
Very interesting. I just worked to implement a baby version of this kind of system at work. Similar to this project, our basic use case was allowing researchers to quickly&#x2F;easily execute their arbitrary R&amp;D code on cloud resources. It&#x27;s difficult to know in advance what they might be doing, and we wanted to avoid a situation where they are pushing a docker container or submitting a file every time they change something. So we made it possible for them to &quot;just&quot; ship a single class&#x2F;function without leaving their local interactive environment.<p>I see from looking at the source here, run.house is using the same approach of cloudpickling the function. That works, but one struggle we are having is it&#x27;s quite brittle. It&#x27;s all gravy assuming everyone is operating in perfectly fresh environments that mirror the cluster, but this is rarely the case. Even subtle changes in the execution environment locally can produce segfaults when run on the server. Very hard to debug. The code here looks a lot more mature, so I&#x27;m assuming this is more robust than what we have. But would be curious if the developers have run into similar challenges.
评论 #36529595 未加载
cbarrick将近 2 年前
&gt; Just as PyTorch lets you send a model .to(&quot;cuda&quot;), Runhouse enables hardware heterogeneity by letting you send your code (or dataset, environment, pipeline, etc) .to(“cloud_instance”, “on_prem”, “data_store”...), all from inside a Python notebook or script. There’s no need to manually move the code and data around, package into docker containers, or translate into a pipeline DAG.<p>From an SRE perspective, this sounds like a nightmare. Controlled releases are <i>really</i> important for reliability. I definitely don&#x27;t want my devs doing manual rollouts from a notebook.
评论 #36529785 未加载
m_ke将近 2 年前
Since people are suggesting alternatives, I&#x27;d like to shoutout skypilot: <a href="https:&#x2F;&#x2F;github.com&#x2F;skypilot-org&#x2F;skypilot">https:&#x2F;&#x2F;github.com&#x2F;skypilot-org&#x2F;skypilot</a><p>EDIT: looks like this actually uses it under the hood: <a href="https:&#x2F;&#x2F;github.com&#x2F;run-house&#x2F;runhouse&#x2F;blob&#x2F;main&#x2F;requirements.txt#L8">https:&#x2F;&#x2F;github.com&#x2F;run-house&#x2F;runhouse&#x2F;blob&#x2F;main&#x2F;requirements...</a>
评论 #36530520 未加载
voz_将近 2 年前
This is a cool approach. I really like the notion of small, powerful components that compose well together. ML infra is sorely missing this piece. I wish you the best of luck!
guluarte将近 2 年前
Sounds similar to <a href="https:&#x2F;&#x2F;dstack.ai&#x2F;docs&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;dstack.ai&#x2F;docs&#x2F;</a>
ipsum2将近 2 年前
&gt; Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body)<p>This seems like a major limitation and pretty antithetical to the PyTorch approach.
评论 #36616337 未加载
chenzhekl将近 2 年前
How do you compare Runhouse with Ray which also simplifies distributed computing?
评论 #36530253 未加载
pavelstoev将近 2 年前
Have you tired Hidet ? <a href="https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;hidet&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;pypi.org&#x2F;project&#x2F;hidet&#x2F;</a>
评论 #36530222 未加载
nologic01将近 2 年前
How would you position this vs the Modular&#x2F;Mojo approach which aims to relieve similar pain points.
评论 #36529639 未加载
评论 #36528576 未加载
评论 #36529281 未加载