TechEcho

7 comments

mjburgessover 2 years ago

> we discuss the phenomena of emergent abilities, which we define as abilities that are not present in small models but are present in larger modelsReading anything by major researchers in AI feels like an adversarial battle where they're trying to misuse as much technical scientific and philosophical language as possible and we adjacent are trying to hold the line.In philosophy and esp. the philosophy of science, emergence is a relation between a whole and its parts such that a property of the whole does not obtain just in virtue of properties of its parts taken in isolation. "Emergence" has this prior positive, semi-magical, scientific association which confuses the issue in this case.No properties of the LLM obtain from its parts differently as parameters scale, the mechanism is the same. The performance differs not due to emergence, but due to the "modelling gap" present between the statistical structure of free text and that of mathematics. With enough examples, the gap closes... indeed, you can model the addition function (add(x, y) = x + y) just by an infinite sample of its domains.A better technical term here might be "scale-dependent capabilities". For LLM, simple arithmetic is extremely scale dependent, whereas basic text generation is less-so. The reason for this seems obvious, as given above... so the use of the term "emergence" here I interpert as more PRish mystification.

评论 #34052802 未加载

评论 #34054225 未加载

评论 #34056613 未加载

评论 #34056127 未加载

评论 #34058829 未加载

评论 #34053528 未加载

评论 #34065278 未加载

评论 #34060857 未加载

评论 #34056623 未加载

评论 #34051932 未加载

评论 #34052453 未加载

评论 #34052161 未加载

xpeover 2 years ago

Wikipedia has a fine definition of what _emergent_ means:> In philosophy, systems theory, science, and art, emergence occurs when an entity is observed to have properties its parts do not have on their own, properties or behaviors that emerge only when the parts interact in a wider whole.The linked article uses this definition:> we discuss the phenomena of emergent abilities, which we define as abilities that are not present in small models but are present in larger modelsThe concept in the paper has to do with capabilities / abilities that grow non-linearly as a function of model size. This is distinctly different from _emergent behavior_ in systems theory.<opinion>The authors and reviewers could find a better word for their concept. There is no need to muddle the concept.</opinion>Furthermore, the idea that networks of certain sizes are necessary for certain kinds of representational abilities is not new. Perhaps a term exists already?

评论 #34051913 未加载

CGamesPlayover 2 years ago

Do these scale-dependent (I like this adjective better than "emergent") properties survive model distillation? It may be that our training/optimization processes are inefficient and require these scales to achieve, but the underlying model may not actually require the number of parameters that we are giving them. I haven't read any of the papers about distillation yet, does anyone know if this has been tested?

评论 #34058395 未加载

evrimoztamurover 2 years ago

Has there been any efforts in processing calculation prompts, where instead of letting it internally 'compute', it's trained to identify equations and process them with an external calculator instead (perhaps one which outputs not only the result but the individual steps too)?

评论 #34055024 未加载

评论 #34058422 未加载

评论 #34053086 未加载

ttctciyfover 2 years ago

There's a quite accessible IAS presentation[1] from another Google researcher on Solving Quantitative Reasoning Problems with Language Models which gives some likely related background on having language models solve this type of math problem, including the "chain of thought" technique mentioned here.I found it pretty interesting and as something of an ML skeptic was a bit surprised at the degree of coherence shown in "reasoning" examples similar to the ones in the linked article.1: <a href="https://www.youtube.com/watch?v=qV4Ku5L4BuMt">https://www.youtube.com/watch?v=qV4Ku5L4BuMt</a>

djoldmanover 2 years ago

Paper: <a href="https://openreview.net/forum?id=yzkSU5zdwD" rel="nofollow">https://openreview.net/forum?id=yzkSU5zdwD</a>

seydorover 2 years ago

The X axis here is the training Flops but what about parameter size and how does it account for the different architectures. Comparing apples to shoelaces may not be a fruitful approach or indicative of what to expect from ever-expanding scale. Also , is it emergence or overfitting

7 comments

mjburgessover 2 years ago

评论 #34052802 未加载

评论 #34054225 未加载

评论 #34056613 未加载

评论 #34056127 未加载

评论 #34058829 未加载

评论 #34053528 未加载

评论 #34065278 未加载

评论 #34060857 未加载

评论 #34056623 未加载

评论 #34051932 未加载

评论 #34052453 未加载

评论 #34052161 未加载

xpeover 2 years ago

评论 #34051913 未加载

CGamesPlayover 2 years ago

评论 #34058395 未加载

evrimoztamurover 2 years ago

评论 #34055024 未加载

评论 #34058422 未加载

评论 #34053086 未加载

ttctciyfover 2 years ago

djoldmanover 2 years ago

Paper: <a href="https://openreview.net/forum?id=yzkSU5zdwD" rel="nofollow">https://openreview.net/forum?id=yzkSU5zdwD</a>

seydorover 2 years ago

Characterizing emergent phenomena in large language models

7 comments

Characterizing emergent phenomena in large language models

7 comments