TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How expensive are LLMs to query, really?

5 点作者 teach5 天前
I&#x27;m starting to see things pop-up from well-meaning people worried about the environmental cost of large language models. Just yesterday I saw a meme on social media that suggested that &quot;ChatGPT uses 1-3 bottles of water for cooling for every query you put into it.&quot;<p>This seems unlikely to me, but what is the truth?<p>I understand that _training_ an LLM is very very expensive. (Although so is spinning up a fab for a new CPU.) But it seems to me the incremental costs to query a model should be relatively low.<p>I&#x27;d love to see your back-of-the-envelope calculations for how much water and especially how much electricity it takes to &quot;answer a single query&quot; from, say, ChatGPT, Claude-3.7-Sonnet or Gemini Flash. Bonus points if you compare it to watching five minutes of a YouTube video or doing a Google search.<p>Links to sources would also be appreciated.

2 条评论

serendipty015 天前
Some links:<p><a href="https:&#x2F;&#x2F;www.sustainabilitybynumbers.com&#x2F;p&#x2F;carbon-footprint-chatgpt" rel="nofollow">https:&#x2F;&#x2F;www.sustainabilitybynumbers.com&#x2F;p&#x2F;carbon-footprint-c...</a><p><a href="https:&#x2F;&#x2F;andymasley.substack.com&#x2F;p&#x2F;a-cheat-sheet-for-conversations-about" rel="nofollow">https:&#x2F;&#x2F;andymasley.substack.com&#x2F;p&#x2F;a-cheat-sheet-for-conversa...</a><p>(discussion on lobste.rs - <a href="https:&#x2F;&#x2F;lobste.rs&#x2F;s&#x2F;bxixuu&#x2F;cheat_sheet_for_why_using_chatgpt_is_not" rel="nofollow">https:&#x2F;&#x2F;lobste.rs&#x2F;s&#x2F;bxixuu&#x2F;cheat_sheet_for_why_using_chatgpt...</a>)<p>(discussion on HN, 320 comments: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42745847">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42745847</a>)
评论 #43940477 未加载
a_conservative5 天前
my m4max macbook can run local inference on a medium-ish gemini model (32b IIRC). The power consumption spikes by about 120 watts over idle (with multiple electron apps, docker, etc). It runs about 70 tokens&#x2F;sec and usually responds within 10 to 20 seconds.<p>So.. picking some numbers for calculation. 4 answers per minute @ 120 watts is about .5 watt-hours per answer. ~200 responses would be enough to drain the (normally quite long lasting battery).<p>How does that compare to the more common nvidia GPUs? I don&#x27;t know.