On Chomsky and the Two Cultures of Statistical Learning (2011)

152 pointsby kerckeralmost 9 years ago

19 comments

mrow84almost 9 years ago

I think that Norvig hits the nail on the head near the beginning of his piece:"I believe that Chomsky has no objection to this kind of statistical model [the Newtonian model of gravitational attraction]. Rather, he seems to reserve his criticism for statistical models like Shannon's that have quadrillions of parameters, not just one or two."This is no more than an objection to problems of fitting your chosen model to data. If you only have a small number of free parameters, then you can fit your model with a reasonable amount of data. If you have a large number of parameters then you have to introduce some extra assumptions, as Norvig (of course) acknowledges slightly earlier (described as "smoothing", in context):"For example, a decade before Chomsky, Claude Shannon proposed probabilistic models of communication based on Markov chains of words. If you have a vocabulary of 100,000 words and a second-order Markov model in which the probability of a word depends on the previous two words, then you need a quadrillion (10^15) probability values to specify the model. The only feasible way to learn these 10^15 values is to gather statistics from data and introduce some smoothing method for the many cases where there is no data."Thus, although both models are statistical, it is much easier to have confidence in Newton's law of gravitation than it is in a Markov model of some communication channel, because the data tell a clear picture. The imprecision of Newton's law in certain parts of the problem space (unobserved during his time) is a moot point - any such objections apply equally well to models with many parameters, and then you _still_ have to accept that you have made extra assumptions "outside" the scope of your model.If you can explore your entire problem space, then you can build a complete "model". If not, then having more parameters than data _requires_ additional assumptions. Chomsky's point stands.

评论 #11954123 未加载

评论 #11955349 未加载

fauigerzigerkalmost 9 years ago

Let's consider the supreme court handshake problem, but say there is an unwritten social law that forces judges to only ever initiate a handshake with judges less senior than themselves. If the seniority of two particular judges happened to be exactly equal, they would not shake hands at all.Let's assume (I don't know if it is actually true or not) that in the history of the supreme court, there have never been two judges of the exact same seniority. In that case, a model learned from handshake data would not include the slightest hint of this unwritten social law.I think what Chomsky is saying is that if we do not understand the generative principle behind any data, we cannot possibly know what circumstance might completely invalidate our model. There may not be a way to smooth this out.Language understanding, contrary to things like speech recognition, does not lend itself very well to smoothing.

评论 #11955732 未加载

评论 #11952688 未加载

评论 #11952648 未加载

评论 #11957309 未加载

评论 #11952926 未加载

UhUhUhUhalmost 9 years ago

Again, this boils down to the rationalist vs. empiricist stance. Hidden variable vs. probabilistic approximation and so forth. The partisan error is to consider these two stances as mutually exclusive. They are not, as illustrated by decoherence for example. On the other hand, it is, I believe, undeniable that the rationalist endeavor is much more complex than the empiricist one and therefore also less linear. Less predictable and, please, let's not forget, less financially profitable. Chomsky's position has always been to advocate for the rationalist stance in a world conveniently inebriated with its empiricist successes to the point of turning this one aspect of thinking into a belief system. It is a waste of energy to argue for a monopoly of the yin over the yang or the other way around. They are complementary but not mutually exclusive. Privileging one over the other will result in an increase of ideological/mystical bias and essentially miss the whole point: it is their interaction, the transition from and to one another, that holds the key to a global understanding.

atmosxalmost 9 years ago

In this video[1] Varoufakis takes on modern economics and their models as a way of understanding the real world and predict what's going to happen.On a higher level, I believe that he speaks for most (if not all) social sciences and what happens when they cross mathematical models who blindly try to understand and predict the real world through a flawed, limited mathematical model.[1] <a href="https://youtu.be/L5AUAIzciLE?t=1355" rel="nofollow">https://youtu.be/L5AUAIzciLE?t=1355</a>

评论 #11952120 未加载

评论 #11952334 未加载

评论 #11951805 未加载

评论 #11952366 未加载

faizshahalmost 9 years ago

More info on Chomsky's argument here: <a href="http://www.theatlantic.com/technology/archive/2012/11/noam-chomsky-on-where-artificial-intelligence-went-wrong/261637/" rel="nofollow">http://www.theatlantic.com/technology/archive/2012/11/noam-c...</a>

cschmidtalmost 9 years ago

If two cultures isn't enough for you, there was an interesting blog post from a year ago called "The Three Cultures of Machine Learning":<a href="http://cs.jhu.edu/~jason/tutorials/ml-simplex.html" rel="nofollow">http://cs.jhu.edu/~jason/tutorials/ml-simplex.html</a>

dangalmost 9 years ago

Discussed at the time: <a href="https://news.ycombinator.com/item?id=2591154" rel="nofollow">https://news.ycombinator.com/item?id=2591154</a>.

mcguirealmost 9 years ago

Chomsky's aversion to statistical techniques is much deeper than most of this discussion focuses on.Here's an enlightening quote from Chomsky:"Linguistic theory is concerned primarily with an ideal speaker-listener, in a completely homogeneous speech community, who know its (the speech community's) language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of this language in actual performance. (Chomsky, 1965, p. 3)"Chomsky is uninterested in linguistic data of the kind used to build statistical language models; those are "linguistic performances" and he is only looking at "linguistic competence", the ability of an ideal speaker to "produce and understand an infinite number of sentences in their language, and to distinguish grammatical sentences from ungrammatical sentences."[1]Now, I'm personally happy to criticise statistical techniques for their lack of explanatory power. But I'm not willing to go further and say that data is irrelevant. Chomsky is.[1] <a href="https://en.wikipedia.org/wiki/Linguistic_competence" rel="nofollow">https://en.wikipedia.org/wiki/Linguistic_competence</a>

评论 #11959828 未加载

thanatropismalmost 9 years ago

It seems to me that overarching theories of a Platonic bent miss the embodied-ness, Dasein-ness of human activity.Nevermind the debates about the ultimate nature of the human cognitive process; fact of the matter is that as observed, it's always-already wrapped in emotional-social thinking. Enough that there's reason to question the subject-object split altogether.Now, maybe Chomsky is a kind of extreme social-cognitivist and his abstract generative trees apply to societies as learning and meaning-producing wholes. But on the face of facts, rather than metaphysical speculation as to the nature of personality, intentionality and individuality, it would seem to me that the statistical/machine learning approach already faces language as it happens: as embodied in media, social context and so on.In other words: I fail to see much value in an abstract account of "pure language" as dissociated from the real communicative process as it happens right now as you read me. Sure, "insights" -- but it remains to be shown that "pure linguistics" is a worthwhile endeavor on the level of "pure quantum mechanics" as formal model.

评论 #11952455 未加载

评论 #11952539 未加载

评论 #11952490 未加载

pierrebaialmost 9 years ago

One can try to divine what Chomsky really thinks, believes and means, but one thing remain that always annoyed is is repated absolute stances and the way he express is viewpoint as being the only right one. Often expressed with disdain or dismissal of the opposition. You can disagree with someone, but doing it elegantly is of higher value to me.

评论 #11955340 未加载

评论 #11955358 未加载

评论 #11955911 未加载

gcb0almost 9 years ago

it's important to know that chomsky used the same ideas in the 70s. he defended the biologicism (not sure about english translation of the term) which is basicaly machine learning in sociology. he fell on his face and is probably now in a very good position to throw this criticism.

评论 #11952524 未加载

ntoshevalmost 9 years ago

I wish there was a follow-up to this, Peter Norvig has hinted a few times he is going to write one.

foucalmost 9 years ago

This seems a little ironic, I feel there's some incredibly useful things in AI/ML that rarely get applied to real world problems, instead academics just spend all their time trying to unravel the black box behavior and come up with some model for it.

marmadukealmost 9 years ago

Same spirit as his critique of BF Skinner's theory of verbal behavior oh so many years ago.

3pt14159almost 9 years ago

Chomsky is wrong that statistical models of language provide no insight. They do.Chomsky is right that language has meaning and that many modern statistical techniques essentially ignore this.My take on the issue is that you can't separate linguistic command from true intelligence / cognition. There's a long tail of tricks that us intelligent people can use, but fundamentally they'll only be tricks. And if we truly get something resembling a perfect linguistic-aware AI by this long tail of tricks then we've probably accidentally created real cognition. Maybe after typing this all out I finally understand what Turing meant.

评论 #11955407 未加载

rooundioalmost 9 years ago

tl;dr "Classical" (natural) scientist: "X is only understood once I found first principles that x can be reduced to." "Modern" scientist: "X might be so complicated that there are no first principles that x can be reduced to. Rather, finding a neural network that can do x is the best I can do, and it explains why x can be done."

评论 #11959920 未加载

评论 #11958096 未加载

zumpalmost 9 years ago

What does this mean for the YC chatbot startups?!

dschiptsovalmost 9 years ago

The language as a phenomena is obviously neither purely functional nor purely statistical. Purity is an abstract nonsense, an abstract category of abstractions.It is obvious from serious psychological studies of the process of a language acquisition, that it is similar to training a neural network - there is some knowledge representation grows up in the brain, but the process of training/learning is possible due to having appropriate machinery in the brain.It seems, like we have more that two apriory notions - of time and space, we, perhaps, have, apriory notions of a thing (noun), process (verb) and attribute (adjective) and even predicate at very least, as reflections of our perceptions of physical universe around us with sensory input procession machinery we happen to evolve.It is a mutualy recursive process - we evolved our "inner representation" of reality constrained by senses, but nature selects, in some cases, those with more correct representations.How these apriory notions maps to sounds - details of phonology and morphology is rather irrelevant - we evolved machinery for that. This is why, there is no fundamental, principal differences between human languages. The difference in in a degree, not in a kind.It seems also that we learn not the rules (schools are very recent innovations), but "weights" by being exported to the medium of a local spoken language. Children do it on their own, at least in remote areas, like among nomads of Himalaya, no worse than Americans. This, by the way, is prof that we have everything we need to be Buddha or Einstein.How exactly training occurs is absolutely unknown but it has nothing to do with probabilities. Nature knows nothing about probabilities, but it obviously "knows" rates - how often something happen. Animals "know" how often something happen.Probabilities is an invention of the mind, which leads to so many errors in cases where not all possible outcomes and its caused are know, which is almost always the case. Nature could not rely on such faulty tool.So, like every naturally complex system, it has both "procedures" and "weighted" data. Language capacity is hardwired, but grammar "grows" according to exposure.To speak about hows, and especially how-exactlys in terms of either pure procedures or pure statistics is misleading. It is both.And Mr.Chimsky is right - mere data, leave alone probabilistic models, describe nothing about principles behind what is going on. They does not even describe what's going on correctly, only some approximation to an overview of something unknown being partially observed.The more or less correct model, as a philosophy, must be grounded in reality, especially in that part of it which we call the mind. It has been pointed out, that mind itself is possible because of hardwired apriory notions (grounded in physical universe) of succession and distance, so models should be augmented with these notions too. Pure statistics is nothing.

indubitablyalmost 9 years ago

This is totally off-topic, but man, Norvig writes some miserable HTML.