TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The state of "super fast inference" is frustrating

4 点作者 4k4 个月前
I am talking about the 3 providers I know of which claim super fast inference: Groq, Cerebras and Sambanova. Every one of those claim extremely fast multi hundred tokens per second inference speeds on reasonably large models. Every one of those also have a chat demo on their website which seems to confirm their proposed numbers<p>However, for many months now, each of those providers have literally the same API page where only the Free option with low rates is available. Everything else is &quot;Coming Soon&quot;. No updates, no dates, no estimates, nothing.<p>Come to think of it, there is not a single good inference provider in the whole open source models space that offers a paid API without throttle in over 50 tps consistently. There&#x27;s money to be made here and surprisingly nobody is doing it aggressively

1 comment

imdoxxingme3 个月前
Cerebras claims they have extremely high demand, presumably from enterprise. So I assume it just hasn&#x27;t made business sense yet to open up higher tiers to small customers.