TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Every Flop Counts: Scaling a 300B LLM Without Premium GPUs

117 点作者 bretpiatt大约 2 个月前

4 条评论

flowerthoughts大约 2 个月前
They never mention what hardware they&#x27;re on.<p>Table 1 is the closest thing. Device specs for six devices: 120-989 TFLOPS and 64-96 GB RAM.<p>An RTX 5090 is about 105 TFLOPS.<p><a href="https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;geforce-rtx-5090.c4216" rel="nofollow">https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;geforce-rtx-5090.c4216</a>
评论 #43511554 未加载
rahen大约 2 个月前
I&#x27;m pretty surprised by the claimed memory usage for 300B parameters (table 1). If we compare similar models:<p>- Llama 3.1 with 405B parameters: 2 TB of memory (FP32), 500 GB (FP8)<p>- DeepSeek R1 with 671B parameters: 1.3 TB (scaling linearly, around 600 GB for 300B parameters)<p>Ling claims no more than 96 GB of memory, most likely for inference. That&#x27;s far more than a 20% reduction. Am I missing something?
评论 #43507478 未加载
评论 #43505968 未加载
vednig大约 2 个月前
They&#x27;ve shared some interesting optimization techniques for bigger LLMs that&#x27;s all, not exactly low powered devices as in power consumption. Still a good read.
osti大约 2 个月前
I think this is the one where they train LLM without NVIDIA GPU&#x27;s.
评论 #43507395 未加载