TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Why don't LLMs ask for calculators?

41 点作者 13years3 个月前

14 条评论

luma3 个月前
They can do exactly that, it&#x27;s called Tool Use and nearly all modern models can handle it. For example, I have a consumer GPU that can run a R1 Qwen distill, which, when prompted for a large multiplication, will elect to write a python script to find the answer.<p>This is a table stakes feature for even the open&#x2F;free models today.
评论 #43119837 未加载
评论 #43121304 未加载
评论 #43120256 未加载
评论 #43120001 未加载
评论 #43120170 未加载
评论 #43120006 未加载
colonCapitalDee3 个月前
Claude Sonnet 3.5 will often use JavaScript as calculator. It&#x27;s not perfect when it comes to deciding whether it should write code, but that&#x27;s easy to fix by prompting it with &quot;Write some code to help you answer the question&quot;.<p>The post is honestly quite strange. &quot;When LLMs try and do math themselves they often get it wrong&quot; and &quot;LLMs don&#x27;t use tools&quot; are two entirely different claims! The first claim is true, the second claim is false, and yet the article uses the truth of the first claim as evidence for the second! This does not hold up at all.
评论 #43120536 未加载
评论 #43119953 未加载
评论 #43120024 未加载
PaulHoule3 个月前
Many LLMs, particularly, coding assistants, use &quot;tools&quot;. Here is one with a calculator<p><a href="https:&#x2F;&#x2F;githubnext.com&#x2F;projects&#x2F;gpt4-with-calc&#x2F;" rel="nofollow">https:&#x2F;&#x2F;githubnext.com&#x2F;projects&#x2F;gpt4-with-calc&#x2F;</a><p>and another example<p><a href="https:&#x2F;&#x2F;www.pinecone.io&#x2F;learn&#x2F;series&#x2F;langchain&#x2F;langchain-tools&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.pinecone.io&#x2F;learn&#x2F;series&#x2F;langchain&#x2F;langchain-too...</a><p>LLMs often do a good job at mathy coding, for instance I told Copilot that &quot;i want a python function that computes the collatz sequence for a given starting n and returns it as a list&quot;<p><pre><code> def collatz_sequence(n): sequence = [n] while n != 1: if n % 2 == 0: n = n &#x2F;&#x2F; 2 else: n = 3 * n + 1 sequence.append(n) return sequence </code></pre> which gives right answers, which I wouldn&#x27;t count on copilot being able to do on its own.
评论 #43119931 未加载
评论 #43120126 未加载
评论 #43125142 未加载
crancher3 个月前
&gt; Now, some might interject here and say we could, of course, train the LLM to ask for a calculator. However, that would not make them intelligent. Humans require no training at all for calculators, as they are such intuitive instruments.<p>Does the author really believe humans are born with an innate knowledge of calculators and their use?
评论 #43119915 未加载
评论 #43119897 未加载
评论 #43119959 未加载
Sysreq23 个月前
A lot of people are talking about tool use and writing internal scripts, and yeah, that’s kind of an answer. Really though I think the author is highlighting that LLMs are not being used efficiently at the present moment.<p>LLMs are great at certain tasks. Databases are better at certain tasks. Calculators too. While we could continually throw more and more compute at the problem, growing layers and injecting more data, wouldn’t it make more sense to just have an LLM call its own back-end calculator agent? When I ask it for obscure information maybe it should just pull from it’s own internal encyclopedia database.<p>Let LLMs do what they do well, but let’s not forget the decades that brought us here. Even the smartest human still uses a calculator, so why doesn’t an AI? The fact that it writes its own JavaScript is flashy as hell but also completely unnecessary and error prone.
评论 #43120545 未加载
Workaccount23 个月前
I don&#x27;t know what happened but there was that time when GPT-4 could access wolfram alpha, and anytime you asked it something that was beyond the most basic math, it would automatically prompt wolfram for the answer.
评论 #43121425 未加载
Terr_3 个月前
&gt; The LLM has no self-reflection for the knowledge it knows and has no understanding of concepts beyond what can be assembled by patterns in language.<p>My favorite framing: The LLM is just an ego-less extender of text documents. It is being iteratively run against movie script, which is usually incomplete and ending in: &quot;User Says X, and Bot responds with...&quot;<p>Designers of these systems have--deliberately--tricked consumers into thinking they are talking to the LLM author, rather than supplying mad-libs dialogue for a User character that is the same fictional room as a Bot character.<p>The Bot can only speak limitations which are story-appropriate for the character. It only says it&#x27;s bad at math because lots of people have written lots of words saying the same thing. If you changed its name and description to Mathematician Dracula, it would have dialogue about how its <i>awesome</i> at math but can&#x27;t handle sunlight, crucifixes, and garlic.<p>This framing also explains how &quot;prompt injection&quot; and &quot;hallucinations&quot; 3 are not exceptional, but standard core behavior.
scarface_743 个月前
The paid version of ChatGPT has had a built in Python runtime for well over a year.<p>The [&gt;_] links to the Python code that was run.<p><a href="https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67b79516-9918-8010-897c-ba061a29847a" rel="nofollow">https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67b79516-9918-8010-897c-ba061a2984...</a>
评论 #43119990 未加载
DrNosferatu3 个月前
I’m surprised why LLMs don’t have in their system prompt a hard rule instructing that any numeric computations in particular, and any other computations in general must only be performed by tool use &#x2F; running Python.
sega_sai3 个月前
I am puzzled by the fact that the modern LLMs don&#x27;t do multiplication in the same way humans do it, i.e. digit by digit. Surely they can write an algorithm for that, but why can&#x27;t they perform it ?
评论 #43120225 未加载
评论 #43120133 未加载
评论 #43120280 未加载
评论 #43120106 未加载
karparov3 个月前
And as you see in the responses here, most people miss the point, elect to patch over the aspects in which the lack of intelligence is glaring, and eventually the end product will be so hard to distinguish from actual intelligence that it&#x27;s deemed &quot;good enough&quot;.<p>Is that bad? Idk. If you hoped that real AGI would eventually solve humanities biggest problems and questions, perhaps so. But if you want something that really really <i>looks</i> like AGI except to some nerds who still say &quot;well actually&quot;, then it&#x27;s gonna be good enough for most. And certainly sufficient for ending up the dystopia from that movie clip in the end.
评论 #43120108 未加载
ThrowawayTestr3 个月前
I just ask chatgpt to use a script to calculate an answer
tombert3 个月前
Umm, they do though? When I use ChatGPT it will phone out to Wolfram Alpha to compute numbers and the like.
评论 #43120047 未加载
behnamoh3 个月前
this is copium, the author doesn&#x27;t have a good grasp on LLMs. you can&#x27;t simply &quot;ask&quot; a language model to see if they know they&#x27;re bad at math and then conclude that the response actually reflects the knowledge encapsulated in the model... sigh...