TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Run and fine-tune 175B+ LMs in Colab using a P2P network of GPUs

3 点作者 borzunov超过 2 年前
Hi everyone! We made a library for inference&#x2F;fine-tuning of open 175B+ language models (like BLOOM) at home without having high-end GPUs. You join forces with other people over the Internet (BitTorrent-style), each running a small part of model layers. Check it out in Colab:<p><a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1Ervk6HPNS6AYVr3xVdQnY5a-TjjmLCdQ?usp=sharing" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1Ervk6HPNS6AYVr3xVdQ...</a><p>Thing is, even though BLOOM weights were publicly released, it was extremely difficult to run inference efficiently unless you had lots of hardware to load the entire model into the GPU memory (you need at least 3x A100 or 8x 3090 GPUs). E.g., in case of offloading, you can only reach the speed of ~10 sec&#x2F;step for sequential (non-parallel) generation. A possible alternative is to use APIs, but they are paid and not always flexible (you can’t adopt new fine-tuning&#x2F;sampling methods or take a look at hidden states). So, Petals come to the rescue!<p>Please share what you think of it!

暂无评论

暂无评论