TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why are we using LLMs as calculators?

20 点作者 fzliu5 个月前

6 条评论

krackers5 个月前
&quot;We don’t actually want them to do math for the sake of replacing calculators&quot; - couldn&#x27;t the article just end here? People aren&#x27;t giving it multiplication problems or asking it to count letters because they want to know the answer. Given that you can &quot;patch&quot; the issue by having it invoke python for computation, the real value is in seeing whether current models can learn to follow a simple step-by-step procedure.<p>The linked tweet <a href="https:&#x2F;&#x2F;twitter.com&#x2F;yuntiandeng&#x2F;status&#x2F;1836114401213989366" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;yuntiandeng&#x2F;status&#x2F;1836114401213989366</a> is far more interesting to me, gpt models clearly _can_ learn to multiply with intermediate tokens, but even o1 currently doesn&#x27;t. And yet this would be a case where generating synthetic data is almost trivial. And moreover, being able to perform computations in this fashion would be valuable for many types of benchmarks (e.g. FrontierMath, since I&#x27;m sure at the end of the day you&#x27;ll have to grind through some computation).<p>So why hasn&#x27;t it been a priority? I remember some NeurIPS presentation claiming that heavily training on math in this fashion hurt language scores. But then the follow-up would be to have specialized models for each and route between them...
评论 #42500199 未加载
aeonik5 个月前
Author didn&#x27;t really cover the reason I use it as a calculator, They are good at translating a request into into hard numbers.<p>Example: &quot;I have a 120 square foot room, and I want to store liquid nitrogen in it. How many liters would it take to displace enough air to be a concern? What kind of CFM should a ventilation system use to clear the room?&quot;<p>I sanity check the numbers, but it&#x27;s really nice to have such an interdisciplinary calculator like this.
评论 #42499434 未加载
评论 #42499403 未加载
SOLAR_FIELDS5 个月前
&gt; We don’t actually want them to do math for the sake of replacing calculators, we want to understand if they can reason their way to AGI<p>Speak for yourself. I’d like them to do math for the sake of replacing calculators. Well, not really. But I’d like them to be a really good natural language interface for a calculator
IAmGraydon5 个月前
I’m starting to get the feeling that we’ve created a human simulator, not a device for artificial reasoning. It’s a highly searchable database of the aggregate of most publicly accessible human knowledge.
评论 #42499249 未加载
评论 #42499283 未加载
stogot5 个月前
I saw the use case yesterday of using LLMs for spreadsheets and I paused myself to think “would I ever be foolish enough to trust the output?” I’d have to check everything myself, so what’s the point?<p>If o4 can’t go beyond 4x4 accurately, then anyone using LLMs for business spreadsheets or science is a serious mistake
评论 #42499259 未加载
impure5 个月前
Yes, but it gives the answer in fancy latex formatting and it tells you the formulas and sequence of computations that got you there. You need to double-check its answers but LLMs are right most of the time.