科技回声

13 条评论

rors5 个月前

It seems obvious to me that LLMs wouldn't be able to find examples of every single problem posed to them in training data. There wouldn't be enough examples for the factual look up needed in an information retrieval style search. I can believe that they're doing some form of extrapolation to create novel solutions to posed problems.It's interesting that this paper doesn't contradict the conclusions of the Apple LLM paper[0], where prompts were corrupted to force the LLM into making errors. I can also believe that LLMs can only make small deviations from existing example solutions in creation of these novel solutions.I hate that we're using the term "reasoning" for this solution generation process. It's a term coined by LLM companies to evoke an almost emotional response on how we talk about this technology. However, it does appear that we are capable of instructing machines to follow a series of steps using natural language, with some degree of ambiguity. That in of itself is a huge stride forward.[0] <a href="https://machinelearning.apple.com/research/gsm-symbolic" rel="nofollow">https://machinelearning.apple.com/research/gsm-symbolic</a>

评论 #42291031 未加载

评论 #42292114 未加载

评论 #42305055 未加载

评论 #42290988 未加载

jpcom5 个月前

You mean you need humans to step-by-step solve a problem so a neural net can mimic it? It sounds kinda obvious now that I write it out.

评论 #42290112 未加载

ijk5 个月前

This would explain the unexpected benefits of training on code.

评论 #42290787 未加载

ninetyninenine5 个月前

>On the one hand, LLMs demonstrate a general ability to solve problems. On the other hand, they show surprising reasoning gaps when compared to humans, casting doubt on the robustness of their generalisation strategiessurprised this gets voted up given the surprising amount of users on HN who think LLMs can't reason at all and that the only way to characterize an LLM is through the lens of a next token predictor. Last time I was talking about LLM intelligence someone rudely told me to read up on how LLMs work and that we already know exactly how they work and they're just token predictors.

评论 #42291271 未加载

评论 #42290947 未加载

评论 #42295790 未加载

评论 #42296371 未加载

btilly5 个月前

This is highly relevant to the recent discussion at <a href="https://news.ycombinator.com/item?id=42285128">https://news.ycombinator.com/item?id=42285128</a>.Google claims that their use of pretraining is a key requirement for being able to deliver a (slightly) better chip design. And they claim that a responding paper that did not attempt to do pretraining, should have been expected to be well below the state of the art in chip design.Given how important reasoning is for chip design, and given how important pretraining is for driving reasoning in large language models, it is obvious that Google's reasoning is very reasonable. If Google barely beats the state of the art while using pretraining, an attempt that doesn't pretrain should be expected to be well below the current state of the art. And therefore that second attempt's poor performance says nothing about whether Google's results are plausible.

评论 #42291110 未加载

andai5 个月前

> In the extreme case, a language model answering reasoning questions may rely heavily on retrieval from parametric knowledge influenced by a limited set of documents within its pretraining data. In this scenario, specific documents containing the information to be retrieved (i.e. the reasoning traces) contribute significantly to the model’s output, while many other documents play a minimal role.> Conversely, at the other end of the spectrum, the model may draw from a broad range of documents that are more abstractly related to the question, with each document influencing many different questions similarly, but contributing a relatively small amount to the final output. We propose generalisable reasoning should look like the latter strategy.Isn't it much more impressive if a model can generalize from a single example?

semessier5 个月前

that resonates - less facts and more reasoning training data. The most low hanging in terms of non synthetic data probably being mathematical proofs. With prolog and the like many alternate reasoning paths could be generated. It's hard to say if these many-path would help in llm training without access to the gigantic machines (it's so unfair) to try it on.

largbae5 个月前

Is this conclusion similar to my layman's understanding of AlphaGo vs AlphaZero? That human procedural knowledge helps ML training to a point, and from there on becomes a limitation?

评论 #42289939 未加载

ricardobeat5 个月前

Does this mean LLMs might do better if trained on large amounts of student notes, exams, book reviews and such? That would be incredibly interesting.

评论 #42290917 未加载

评论 #42291421 未加载

samirillian5 个月前

Okay dumb question, why are the images they generate nightmarish nonsense. Why can’t they procedurally construct a diagram

sgt1015 个月前

drives retrieval of patterns of procedure?I mean - like for arithmetic?

shermantanktop5 个月前

Going meta a bit: comments so far on this post show diametrically opposing understandings of the paper, which demonstrates just how varied the interpretation of complex text can be.We hold AI to a pretty high standard of correctness, as we should, but humans are not that reliable on matters of fact, let alone on rigor of reasoning.

评论 #42291119 未加载

评论 #42291624 未加载

评论 #42297437 未加载

评论 #42297590 未加载

ScottPowers5 个月前

thanks

13 条评论

rors5 个月前

评论 #42291031 未加载

评论 #42292114 未加载

评论 #42305055 未加载

评论 #42290988 未加载

jpcom5 个月前

You mean you need humans to step-by-step solve a problem so a neural net can mimic it? It sounds kinda obvious now that I write it out.

评论 #42290112 未加载

ijk5 个月前

This would explain the unexpected benefits of training on code.

评论 #42290787 未加载

ninetyninenine5 个月前

评论 #42291271 未加载

评论 #42290947 未加载

评论 #42295790 未加载

评论 #42296371 未加载

btilly5 个月前

评论 #42291110 未加载

andai5 个月前

semessier5 个月前

largbae5 个月前

Is this conclusion similar to my layman's understanding of AlphaGo vs AlphaZero? That human procedural knowledge helps ML training to a point, and from there on becomes a limitation?

评论 #42289939 未加载

ricardobeat5 个月前

Does this mean LLMs might do better if trained on large amounts of student notes, exams, book reviews and such? That would be incredibly interesting.

评论 #42290917 未加载

评论 #42291421 未加载

samirillian5 个月前

Okay dumb question, why are the images they generate nightmarish nonsense. Why can’t they procedurally construct a diagram

Procedural knowledge in pretraining drives reasoning in large language models

13 条评论

Procedural knowledge in pretraining drives reasoning in large language models

13 条评论