TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GPT-3.5/4 response times are linear with output tokens

9 pointsby marcklingenover 1 year ago

1 comment

seeknotfindover 1 year ago
It&#x27;s funny seeing this measured experimentally without an explanation on a .ai site. Transformer models need to be rerun for each additional token. The algorithm itself (running the whole fixed-sized model again) is linear in time. So, this is the expected result.<p>Seeing something different than this would point to an advancement. If it were discovered how to do this sublinearly, a company like OpenAI may even want to hide by artificially delaying responses to preserve the cost advantage.
评论 #37214759 未加载