On Chomsky and the Two Cultures of Statistical Learning

83 点作者 georgehill大约 2 年前

19 条评论

Well, Chomsky already dismissed corpus based linguistics in the 90s and 2000s, because a corpus (large collection of text documents, e.g., newspaper, blog post, literature or everything mixed together) is never a good enough approximation of the true underlying distribution of all words/constructs in a language. For example, a newspaper-based corpus might have frequent occurences of city names or names of politicians, whereas they might not occur that often in real everyday speech, because many people don't actually talk about those politicians all day long. Or, alternatively, names of small cities might have a frequency of 0.Naturally, he will, and does, also dismiss anything that occured in the ML field in the past decade.But I agree with the article. Dealing with language only in a theoretical/mathematical way, not even trying to evaluate your theories with real data, is just not very efficient and ignores that language models do seem to work to some degree.

评论 #35082481 未加载

评论 #35085118 未加载

评论 #35083550 未加载

aj7大约 2 年前

“As another example, consider the Newtonian model of gravitational attraction, which says that the force between two objects of mass m1 and m2 a distance r apart is given by F = Gm1m2/r^2 where G is the universal gravitational constant. This is a trained model because the gravitational constant G is determined by statistical inference over the results of a series of experiments that contain stochastic experimental error.”No, it’s not. Period. It’s a THEORY that is supported “by statistical inference over the results of a series of experiments that contain stochastic experimental error.” It’s been checked hundreds of times and its accuracy has gotten better and better. The technology of the checking led to new science. By making it consistent with special relativity, and POSTULATING the principle of equivalence, a new THEORY was born, general relativity, also known as the theory of gravitation. IT has been checked hundreds of times, ”the technology of the checking has led to new science,” and it is a current very active field of RESEARCH, (not model training.)It’s a little horrifying that Norvig doesn’t seem to understand these nuances.The same arguments apply to the solid state physics underlying the machines that run large language models. That too, is a “THEORY. It has been checked hundreds of times, the technology of the checking has led to new science, and it is a current very active field of RESEARCH, (not model training.)”

评论 #35082950 未加载

评论 #35084308 未加载

评论 #35082692 未加载

评论 #35082873 未加载

评论 #35083017 未加载

评论 #35083037 未加载

reliableturing大约 2 年前

As someone who tends to side with Chomsky in these debates, I think Norvig makes some interesting points. However, I would like to pick one of his criticisms to disagree on:"In 1969 he [Chomsky] famously wrote:<pre><code> But it must be recognized that the notion of "probability of a sentence" is an entirely useless one, under any known interpretation of this term. </code></pre> His main argument being that, under any interpretation known to him, the probability of a novel sentence must be zero, and since novel sentences are in fact generated all the time, there is a contradiction. The resolution of this contradiction is of course that it is not necessary to assign a probability of zero to a novel sentence; in fact, with current probabilistic models it is well-known how to assign a non-zero probability to novel occurrences, so this criticism is invalid, but was very influential for decades."I think Norvig wrongly interprets Chomsky's "probability of a sentence is useless" as "the probability of a novel sentence must be zero". I agree that we've shown that it's possible to assign probabilities to sentences in certain contexts, but that doesn't mean that it can fully desribe a language and knowledge. This seems to me yet another case of 'the truth is somewhere in the middle' and would be weary of the false dichotomy that is put forward here. Yes we can assign probabilities to sentences and they can be useful, but it's not the whole story either.

评论 #35083005 未加载

评论 #35083573 未加载

评论 #35083464 未加载

评论 #35083302 未加载

评论 #35083009 未加载

QuadrupleA大约 2 年前

"What did Chomsky mean", the eternal question. He, among with Alan Kay, are among the most (deliberately I suspect) cryptic obfuscatory bullshitters alive. They spout twisted, vague statements that can interpreted in dozens of conflicting ways, and through cult of personality people ascribe great wisdom and genius to them, thinking their grand thoughts must be just beyond the grasp of mere mortals like them.True genius can communicate things clearly I think.Probably not a popular opinion, and I'm a little cranky so grain of salt. But Chomsky and his ilk seem like some of the great intellectual hustlers of our time.

评论 #35086922 未加载

评论 #35083969 未加载

getpost大约 2 年前

This has been posted many times before, but I thought of it yesterday when I saw this essay by Chomsky in the New York Times, "The False Promise of ChatGPT."<a href="https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html" rel="nofollow">https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chat...</a>

评论 #35082685 未加载

评论 #35082824 未加载

评论 #35082265 未加载

评论 #35082409 未加载

robg大约 2 年前

I was in the audience during the panel and Chomsky was really trying to hold onto a view of science disconnected from the engineering needed to replicate the natural phenomenon. To take a simple example:Suppose you’re the Wright Brothers and you built a flying machine.Chomsky’s response would be: But you didn’t explain how birds fly.On the one hand, sure. Birds fly differently. On the other, the machine flies. Why does it need to explain all of flight?The same is true of language models. They don’t explain how we acquire language. But they replicate language uses in ways that look like language speakers.

评论 #35083039 未加载

评论 #35083809 未加载

评论 #35083029 未加载

评论 #35083239 未加载

AlbertCory大约 2 年前

"Every time I fire a linguist, the speech recognition system gets more accurate!"(<a href="https://www.intefrankly.com/articles/Every-time-I-fire-a-linguist-the-speech-recognition-system-gets-more-accurate/0992487dd573" rel="nofollow">https://www.intefrankly.com/articles/Every-time-I-fire-a-lin...</a>)The fact is, linguists failed.They might have beautiful theories, but the theories have no utilitarian value (unlike the laws of gravity).

评论 #35083434 未加载

robg大约 2 年前

These two cultures played out very early in neural networks, see Rumelhart and McClelland, for example [1]. For the language nativists like Chomsky the engineering will never appreciate all of the intricacies of language learning and uses. It’s a fair criticism that both forces the engineering to be better and misses the point. Engineering doesn’t need to replicate how the natural phenomenon “works” in order to provide a compelling set of uses for humanity. Choose a modality of sensing the world - vision, hearing, language, etc. If the engineering replicates and even improves upon the natural basis, we can continue to focus on improving the engineered solution. Nature is a wonderful source of inspiration but why be bound to its instantiation? The brain doesn’t record memories verbatim. Recording devices are better for that reason.[1] <a href="https://mitpress.mit.edu/9780262680530/parallel-distributed-processing/" rel="nofollow">https://mitpress.mit.edu/9780262680530/parallel-distributed-...</a>

photochemsyn大约 2 年前

So far my favorite expository example on this subject involves whether any kind of machine learning model trained on Tycho Brahe's astronomical observational data (late 16th century) would be able to extract the equations of Johannes Kepler (1609) from them. It would likely be able to make good predictions of where planets would be in the future, but that wouldn't be based on simple equations that were easy for humans to look at and understand.(Those are: elliptic orbits, equal areas are swept in equal time by a line joining the orbiting bodies, and orbital periods (time) are proportional to the geometry of the ellipse (distance) by a square:cube ratio, and they're all more or less approximations in the solar system (which has many complicating interactions due to all the bodies involved).)Someone commented that most humans wouldn't be capable of doing that either, which is true enough... Perhaps if the machine learning model was also trained on a large set of equations as well as on a large set of astronomical data?

评论 #35082561 未加载

评论 #35082741 未加载

评论 #35082900 未加载

sampo大约 2 年前

> At the Brains, Minds, and Machines symposium held during MIT's 150th birthday partySo 12 years ago. (The page doesn't have dates, and the links are broken.)

评论 #35083273 未加载

评论 #35082817 未加载

评论 #35083087 未加载

评论 #35082418 未加载

scythe大约 2 年前

Chomsky's argument is not terribly long, nor is it difficult to understand, nor is it nearly as harsh as some people (Aaronson and Norvig, possibly among others) seem to be taking it, nor does he spend much oxygen talking about his own work or preferred approaches (although Aaronson and Norvig seem interested on emphasizing that he mentioned it at all). It is probably worth just reading it:<a href="http://languagelog.ldc.upenn.edu/myl/PinkerChomskyMIT.html" rel="nofollow">http://languagelog.ldc.upenn.edu/myl/PinkerChomskyMIT.html</a>Really this is the core of what Chomsky said:>There is a succ- notion of success which has developed in uh computational cognitive science in recent years which I think is novel in the history of science. It interprets success as uh approximating unanalyzed data. Uh so for example if your were say to study bee communication this way, instead of doing the complex experiments that bee scientists do, you know like uh having fly to an island to see if they leave an odor trail and this sort of thing, if you simply did extensive videotaping of bees swarming, OK, and you did you know a lot of statistical analysis of it, uh you would get a pretty good prediction for what bees are likely to do next time they swarm, actually you'd get a better prediction than bee scientists do, and they wouldn't care because they're not trying to do that. Uh but and you can make it a better and better approximation by more video tapes and more statistics and so on. Uh I mean actually you could do physics this way, uh instead of studying things like balls rolling down frictionless planes, which can't happen in nature, uh if you uh uh took a ton of video tapes of what's happening outside my office window, let's say, you know, leaves flying and various things, and you did an extensive analysis of them, uh you would get some kind of prediction of what's likely to happen next, certainly way better than anybody in the physics department could do. Well that's a notion of success which is I think novel, I don't know of anything like it in the history of science.Now this contrasts powerfully with Norvig's "Galileo" picture. Copernicus and Galileo were excellent mathematicians by contemporary standards. They had data, models, predictions, and criteria for the validity of those models.How do we measure the success of GPT-3? That is the key question Chomsky raises. Galileo's conviction, and his willingness to die defending the truth, was not based merely on the profound experience of looking through a telescope.

foobarqux大约 2 年前

This is like a HN Groundhog Day article.Whenever this article comes up and people point out the flaws in the arguments, which everyone ignores when the next time the article is posted and the same specious reasoning is repeated.

评论 #35089415 未加载

EGreg大约 2 年前

I'm regularly in touch with Chomsky. I've interviewed him several times, the latest being this for example with David Harvey: <a href="https://www.youtube.com/watch?v=ezf7wxJ7whA">https://www.youtube.com/watch?v=ezf7wxJ7whA</a>Earlier regarding Freedom of Speech and Capitalism: <a href="https://qbix.com/chomsky" rel="nofollow">https://qbix.com/chomsky</a>Sometimes I send him articles about how chimpanzees or others have "language". Or now they found bumblebees can teach each other. Chomsky famously maintains that only human are born with innate capabilities for language.Anyway, since Chomsky is a linguist and focused on language, it would make sense for him to say that. Of course, computers can develop their own languages (as Facebook's sales bots have done for example years ago: <a href="https://nypost.com/2017/08/01/creepy-facebook-bots-talked-to-each-other-in-a-secret-language/" rel="nofollow">https://nypost.com/2017/08/01/creepy-facebook-bots-talked-to...</a>)I think that, in general, Chomsky is right that when it comes to language (unlike paintings or even photorealistic fakes etc.) the meaning will never be modeled perfectly, any more than, say modeling a Mandelbrot set on all levels of zoom can be done by a machine learning system that trains in the way generative LLMs train.Having said that, I think that logic itself is a "poor man's approximation" to what AI can do, in that it just uses a few parameters. I prefer Steven Wolfram's analysis. <a href="https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/" rel="nofollow">https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...</a>

评论 #35085688 未加载

评论 #35083042 未加载

Mattasher大约 2 年前

> There is a notion of success ... which I think is novel in the history of science. It interprets success as approximating unanalyzed data.Yes! Yes! Yes! As I've been arguing for years, we've already plucked the low-hanging fruit of science. The vast majority of additional progress will have to be made by using what are essentially black box models, and the proof of a good model will be how well it approximates new data, not how nice the mathematical equation looks or how well it can be explained intuitively.

评论 #35089438 未加载

foobarqux大约 2 年前

I wish there would be a debate at a major venue between Chomsky and the best proponent of LLMs --- over text or video --- but it's not clear who that opponent would even be. There really isn't a towering pro-LLM intellectual figure, one who can also talk a bit about philosophy, cognitive science or linguistics.

avgcorrection大约 2 年前

The part where he compares Chomsky to O’Reilly is a good laugh. (No, it’s not a good comparison.)

评论 #35082864 未加载

antipaul大约 2 年前

[2011] or thereabouts

froh大约 2 年前

(2011)

sampo大约 2 年前

> I take Chomsky's points to be the following: 1. Statistical language models have had engineering success, but that is irrelevant to science.Well, deep neural nets are not statistical models, so shouldn't Chomsky now be at least a little bit happier with ChatGPT?

评论 #35082631 未加载

19 条评论

gabelschlager大约 2 年前

评论 #35082481 未加载

评论 #35085118 未加载

评论 #35083550 未加载

aj7大约 2 年前

评论 #35082950 未加载

评论 #35084308 未加载

评论 #35082692 未加载

评论 #35082873 未加载

评论 #35083017 未加载

评论 #35083037 未加载

reliableturing大约 2 年前

评论 #35083005 未加载

评论 #35083573 未加载

评论 #35083464 未加载

评论 #35083302 未加载

评论 #35083009 未加载

QuadrupleA大约 2 年前

评论 #35086922 未加载

评论 #35083969 未加载

getpost大约 2 年前

评论 #35082685 未加载

评论 #35082824 未加载

评论 #35082265 未加载

评论 #35082409 未加载

robg大约 2 年前

评论 #35083039 未加载

评论 #35083809 未加载

评论 #35083029 未加载

评论 #35083239 未加载

AlbertCory大约 2 年前

评论 #35083434 未加载

robg大约 2 年前

photochemsyn大约 2 年前

评论 #35082561 未加载

评论 #35082741 未加载

评论 #35082900 未加载

sampo大约 2 年前

> At the Brains, Minds, and Machines symposium held during MIT's 150th birthday partySo 12 years ago. (The page doesn't have dates, and the links are broken.)

评论 #35083273 未加载

评论 #35082817 未加载

评论 #35083087 未加载

评论 #35082418 未加载

scythe大约 2 年前

foobarqux大约 2 年前

评论 #35089415 未加载

EGreg大约 2 年前

评论 #35085688 未加载

评论 #35083042 未加载

Mattasher大约 2 年前

评论 #35089438 未加载

foobarqux大约 2 年前

avgcorrection大约 2 年前

The part where he compares Chomsky to O’Reilly is a good laugh. (No, it’s not a good comparison.)

评论 #35082864 未加载

antipaul大约 2 年前

[2011] or thereabouts

froh大约 2 年前

(2011)

sampo大约 2 年前

评论 #35082631 未加载