TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Hidden Rate Limits: How Providers Throttle LLM Throughput During Peak Demand

14 pointsby carlcortrightabout 1 year ago

2 comments

carlcortrightabout 1 year ago
Over the past few days we did an investigation of the main LLM providers, and have observed up to a 40% difference in average speed (tokens / second) from the leading LLM providers like GPT4.
评论 #39804436 未加载
canada_dryabout 1 year ago
Looking ahead, I suspect as AI becomes even more ubiquitous&#x2F;mainstream, AI service providers will offer various levels of analysis at different price points. E.g. the cheapest service will provide reliably accurate answers, but only to simple queries that consume little compute power.<p>Also envisioned is the all too common race-to-the-bottom scenario where services will simply tune their service to respond with the least compute power needed while harvesting and capitalizing on it users data.