TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The state of "super fast inference" is frustrating

4 pointsby 4k4 months ago
I am talking about the 3 providers I know of which claim super fast inference: Groq, Cerebras and Sambanova. Every one of those claim extremely fast multi hundred tokens per second inference speeds on reasonably large models. Every one of those also have a chat demo on their website which seems to confirm their proposed numbers<p>However, for many months now, each of those providers have literally the same API page where only the Free option with low rates is available. Everything else is &quot;Coming Soon&quot;. No updates, no dates, no estimates, nothing.<p>Come to think of it, there is not a single good inference provider in the whole open source models space that offers a paid API without throttle in over 50 tps consistently. There&#x27;s money to be made here and surprisingly nobody is doing it aggressively

1 comment

imdoxxingme3 months ago
Cerebras claims they have extremely high demand, presumably from enterprise. So I assume it just hasn&#x27;t made business sense yet to open up higher tiers to small customers.