科技回声

6 条评论

"We don’t actually want them to do math for the sake of replacing calculators" - couldn't the article just end here? People aren't giving it multiplication problems or asking it to count letters because they want to know the answer. Given that you can "patch" the issue by having it invoke python for computation, the real value is in seeing whether current models can learn to follow a simple step-by-step procedure.The linked tweet <a href="https://twitter.com/yuntiandeng/status/1836114401213989366" rel="nofollow">https://twitter.com/yuntiandeng/status/1836114401213989366</a> is far more interesting to me, gpt models clearly _can_ learn to multiply with intermediate tokens, but even o1 currently doesn't. And yet this would be a case where generating synthetic data is almost trivial. And moreover, being able to perform computations in this fashion would be valuable for many types of benchmarks (e.g. FrontierMath, since I'm sure at the end of the day you'll have to grind through some computation).So why hasn't it been a priority? I remember some NeurIPS presentation claiming that heavily training on math in this fashion hurt language scores. But then the follow-up would be to have specialized models for each and route between them...

评论 #42500199 未加载

aeonik5 个月前

Author didn't really cover the reason I use it as a calculator, They are good at translating a request into into hard numbers.Example: "I have a 120 square foot room, and I want to store liquid nitrogen in it. How many liters would it take to displace enough air to be a concern? What kind of CFM should a ventilation system use to clear the room?"I sanity check the numbers, but it's really nice to have such an interdisciplinary calculator like this.

评论 #42499434 未加载

评论 #42499403 未加载

SOLAR_FIELDS5 个月前

> We don’t actually want them to do math for the sake of replacing calculators, we want to understand if they can reason their way to AGISpeak for yourself. I’d like them to do math for the sake of replacing calculators. Well, not really. But I’d like them to be a really good natural language interface for a calculator

IAmGraydon5 个月前

I’m starting to get the feeling that we’ve created a human simulator, not a device for artificial reasoning. It’s a highly searchable database of the aggregate of most publicly accessible human knowledge.

评论 #42499249 未加载

评论 #42499283 未加载

stogot5 个月前

I saw the use case yesterday of using LLMs for spreadsheets and I paused myself to think “would I ever be foolish enough to trust the output?” I’d have to check everything myself, so what’s the point?If o4 can’t go beyond 4x4 accurately, then anyone using LLMs for business spreadsheets or science is a serious mistake

评论 #42499259 未加载

impure5 个月前

Yes, but it gives the answer in fancy latex formatting and it tells you the formulas and sequence of computations that got you there. You need to double-check its answers but LLMs are right most of the time.

Why are we using LLMs as calculators?

6 条评论

Why are we using LLMs as calculators?

6 条评论