Why don't LLMs ask for calculators?

41 点作者 13years3 个月前

14 条评论

luma3 个月前

They can do exactly that, it's called Tool Use and nearly all modern models can handle it. For example, I have a consumer GPU that can run a R1 Qwen distill, which, when prompted for a large multiplication, will elect to write a python script to find the answer.This is a table stakes feature for even the open/free models today.

评论 #43119837 未加载

评论 #43121304 未加载

评论 #43120256 未加载

评论 #43120001 未加载

评论 #43120170 未加载

评论 #43120006 未加载

colonCapitalDee3 个月前

Claude Sonnet 3.5 will often use JavaScript as calculator. It's not perfect when it comes to deciding whether it should write code, but that's easy to fix by prompting it with "Write some code to help you answer the question".The post is honestly quite strange. "When LLMs try and do math themselves they often get it wrong" and "LLMs don't use tools" are two entirely different claims! The first claim is true, the second claim is false, and yet the article uses the truth of the first claim as evidence for the second! This does not hold up at all.

评论 #43120536 未加载

评论 #43119953 未加载

评论 #43120024 未加载

PaulHoule3 个月前

Many LLMs, particularly, coding assistants, use "tools". Here is one with a calculator<a href="https://githubnext.com/projects/gpt4-with-calc/" rel="nofollow">https://githubnext.com/projects/gpt4-with-calc/</a>and another example<a href="https://www.pinecone.io/learn/series/langchain/langchain-tools/" rel="nofollow">https://www.pinecone.io/learn/series/langchain/langchain-too...</a>LLMs often do a good job at mathy coding, for instance I told Copilot that "i want a python function that computes the collatz sequence for a given starting n and returns it as a list"<pre><code> def collatz_sequence(n): sequence = [n] while n != 1: if n % 2 == 0: n = n // 2 else: n = 3 * n + 1 sequence.append(n) return sequence </code></pre> which gives right answers, which I wouldn't count on copilot being able to do on its own.

评论 #43119931 未加载

评论 #43120126 未加载

评论 #43125142 未加载

crancher3 个月前

> Now, some might interject here and say we could, of course, train the LLM to ask for a calculator. However, that would not make them intelligent. Humans require no training at all for calculators, as they are such intuitive instruments.Does the author really believe humans are born with an innate knowledge of calculators and their use?

评论 #43119915 未加载

评论 #43119897 未加载

评论 #43119959 未加载

Sysreq23 个月前

A lot of people are talking about tool use and writing internal scripts, and yeah, that’s kind of an answer. Really though I think the author is highlighting that LLMs are not being used efficiently at the present moment.LLMs are great at certain tasks. Databases are better at certain tasks. Calculators too. While we could continually throw more and more compute at the problem, growing layers and injecting more data, wouldn’t it make more sense to just have an LLM call its own back-end calculator agent? When I ask it for obscure information maybe it should just pull from it’s own internal encyclopedia database.Let LLMs do what they do well, but let’s not forget the decades that brought us here. Even the smartest human still uses a calculator, so why doesn’t an AI? The fact that it writes its own JavaScript is flashy as hell but also completely unnecessary and error prone.

评论 #43120545 未加载

Workaccount23 个月前

I don't know what happened but there was that time when GPT-4 could access wolfram alpha, and anytime you asked it something that was beyond the most basic math, it would automatically prompt wolfram for the answer.

评论 #43121425 未加载

Terr_3 个月前

> The LLM has no self-reflection for the knowledge it knows and has no understanding of concepts beyond what can be assembled by patterns in language.My favorite framing: The LLM is just an ego-less extender of text documents. It is being iteratively run against movie script, which is usually incomplete and ending in: "User Says X, and Bot responds with..."Designers of these systems have--deliberately--tricked consumers into thinking they are talking to the LLM author, rather than supplying mad-libs dialogue for a User character that is the same fictional room as a Bot character.The Bot can only speak limitations which are story-appropriate for the character. It only says it's bad at math because lots of people have written lots of words saying the same thing. If you changed its name and description to Mathematician Dracula, it would have dialogue about how its awesome at math but can't handle sunlight, crucifixes, and garlic.This framing also explains how "prompt injection" and "hallucinations" 3 are not exceptional, but standard core behavior.

scarface_743 个月前

The paid version of ChatGPT has had a built in Python runtime for well over a year.The [>_] links to the Python code that was run.<a href="https://chatgpt.com/share/67b79516-9918-8010-897c-ba061a29847a" rel="nofollow">https://chatgpt.com/share/67b79516-9918-8010-897c-ba061a2984...</a>

评论 #43119990 未加载

DrNosferatu3 个月前

I’m surprised why LLMs don’t have in their system prompt a hard rule instructing that any numeric computations in particular, and any other computations in general must only be performed by tool use / running Python.

sega_sai3 个月前

I am puzzled by the fact that the modern LLMs don't do multiplication in the same way humans do it, i.e. digit by digit. Surely they can write an algorithm for that, but why can't they perform it ?

评论 #43120225 未加载

评论 #43120133 未加载

评论 #43120280 未加载

评论 #43120106 未加载

karparov3 个月前

And as you see in the responses here, most people miss the point, elect to patch over the aspects in which the lack of intelligence is glaring, and eventually the end product will be so hard to distinguish from actual intelligence that it's deemed "good enough".Is that bad? Idk. If you hoped that real AGI would eventually solve humanities biggest problems and questions, perhaps so. But if you want something that really really looks like AGI except to some nerds who still say "well actually", then it's gonna be good enough for most. And certainly sufficient for ending up the dystopia from that movie clip in the end.

评论 #43120108 未加载

ThrowawayTestr3 个月前

I just ask chatgpt to use a script to calculate an answer

tombert3 个月前

Umm, they do though? When I use ChatGPT it will phone out to Wolfram Alpha to compute numbers and the like.

评论 #43120047 未加载

behnamoh3 个月前

this is copium, the author doesn't have a good grasp on LLMs. you can't simply "ask" a language model to see if they know they're bad at math and then conclude that the response actually reflects the knowledge encapsulated in the model... sigh...