TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DisTrO – a family of low latency distributed optimizers

88 点作者 SchwKatze9 个月前

4 条评论

arjvik9 个月前
There's no information about what this is, beyond a teaser of a loss graph. Really hoping this is something that gets released to the world, not hidden behind closed doors.
评论 #41373176 未加载
logicchains9 个月前
I'd love to believe it's true but I suspect they're overstating the result, or it's a fluke. Presumably teams at large firms like Meta would have put a lot of effort into checking whether not-synchronise-every-step training could match synchronise-every-step training before investing hundreds of millions of dollars into the low-latency, high-throughput network hardware necessary for the latter.
评论 #41372718 未加载
评论 #41372876 未加载
评论 #41374559 未加载
评论 #41372653 未加载
iamronaldo9 个月前
This seems huge no? Couldn't this enable "community based" ai training at home?
评论 #41376441 未加载
simonw9 个月前
Most of the information about this is in this PDF (I hate when people publish interesting information exclusively in PDFs): <a href="https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;NousResearch&#x2F;DisTrO&#x2F;main&#x2F;A_Preliminary_Report_on_DisTrO.pdf" rel="nofollow">https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;NousResearch&#x2F;DisTrO&#x2F;main&#x2F;A...</a><p>I converted it to Markdown (using Gemini 1.5 Pro) and pasted it into a Gist here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;46a33d66e069efe5c10b63625fdabb4e#a-preliminary-report-on-distro" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;46a33d66e069efe5c10b63625fdab...</a><p>From the abstract:<p>&gt; Training large scale neural networks typically involves sharing gradients between all accelerators, which necessitates specialized, high-speed interconnects. To address this, we introduce DisTrO, a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by four to five orders of magnitude without relying on amortized analysis, enabling low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.<p>This could be a HUGE deal.<p>Currently if you want to train giant LLMs you need a big pile of GPUs in the same location as each other due to the amount of information that needs to shuffle between them during training.<p>If DisTrO works as intended, it will be possible to train models using GPUs in different places - potentially enabling SETI@home style training where thousands of people with gaming PCs at home could donate their GPU time to a large training effort.<p>Their tweet about this has more: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;NousResearch&#x2F;status&#x2F;1828121648383566270" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;NousResearch&#x2F;status&#x2F;1828121648383566270</a><p>&gt; Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of architecture-agnostic and network-agnostic distributed optimizers that reduces the inter-GPU communication requirements by 1000x to 10,000x without relying on amortized analysis, and matches AdamW+All-Reduce in convergence rates. This enables low-latency training of large neural networks on slow internet bandwidths with heterogeneous networking hardware.<p>&gt; DisTrO can increase the resilience and robustness of training LLMs by minimizing dependency on a single entity for computation. DisTrO is one step towards a more secure and equitable environment for all participants involved in building LLMs.<p>&gt; Without relying on a single company to manage and control the training process, researchers and institutions can have more freedom to collaborate and experiment with new techniques, algorithms, and models. This increased competition fosters innovation, drives progress, and ultimately benefits society as a whole.
评论 #41372013 未加载
评论 #41381214 未加载
评论 #41374942 未加载