TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

FP8 DeepSeek R1 Distilled LLMs for SGLang and VLLM

1 点作者 jaaron4 个月前

1 comment

jaaron4 个月前
These are the FP8 distilled versions of DeepSeek we&#x27;ve started testing at Jam &amp; Tea.<p>We use LLMs for real-time gameplay. We released Retail Mage last year on Steam as a tech demo of what we can do. We&#x27;ve found FP8 to be the current sweet spot for accuracy and performance for our use case. It&#x27;s one of the many techniques we applied last year to bring our real-time inference costs down by 3 orders of magnitude to make Retail Mage releasable.<p>Anyway, we&#x27;re just sharing this with anyone who finds it useful.<p>You can read a few more note by our ML engineer, Yudi, on LinkedIn here:<p><a href="https:&#x2F;&#x2F;www.linkedin.com&#x2F;feed&#x2F;update&#x2F;urn:li:activity:7290455908306821121&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.linkedin.com&#x2F;feed&#x2F;update&#x2F;urn:li:activity:7290455...</a>