TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

On Chomsky and the Two Cultures of Statistical Learning (2011)

85 点作者 gajju3588大约 7 年前

8 条评论

pesenti大约 7 年前
I used to think the same way. But after spending the last few years getting frustrated trying to create more complex linguistic systems, e.g., dialog systems or scientific articles understanding, I am coming to the conclusion that the current statistical approach is a dead end. It’s actually impeding the field because it’s working so well for certain tasks that when people are trying to build systems with real understanding, they can’t match the performance obtained by gigantic language models. But the answer is not Chomsky, it’s semantic grounding in the real world rather than looking at language as a sequence of symbols.
评论 #16490322 未加载
d_burfoot大约 7 年前
Let me rephrase the debate, in a way that can hopefully clarify the main point of contention:<p>C: Your statistical models are woefully inadequate at describing language.<p>N: That inadequacy is related particularly to Markov models and Ngram models. More sophisticated statistical models will be adequate.<p>C: Then why haven&#x27;t you built the more sophisticated models? Why are you still using Markov models and Ngrams?<p>N: Those work well enough for engineering applications.<p>The attitude of &quot;it works well enough for engineering&quot; is what Chomsky is actually criticizing. And that criticism is entirely valid: an empirical scientist would never claim that a theory is true because it can be used in engineering.<p>It&#x27;s funny to me that Norvig holds up the PCFG as an example of a new and improved statistical model of language. The PCFG is actually terrible in many ways, the most obvious of which is that it doesn&#x27;t take into account the Theta Criterion [1], one of the most fundamental phenomena of language. An example of this rule is that a noun phrase can only have one determiner. This restriction is so strong that it will never be violated in any kind of professionally composed text. But it is very awkward to try to encode this rule in a PCFG (you essentially have to split the NP symbol into DetNP vs UndetNP). I wrote a blog post describing the problems of the PCFG formalism:<p><a href="https:&#x2F;&#x2F;ozoraresearch.wordpress.com&#x2F;2017&#x2F;03&#x2F;17&#x2F;chuckling-a-bit-at-microsoft-and-the-pcfg-formalism&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ozoraresearch.wordpress.com&#x2F;2017&#x2F;03&#x2F;17&#x2F;chuckling-a-b...</a><p>[1]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Theta_criterion" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Theta_criterion</a>
seagullz大约 7 年前
Further elaborations by Chomsky on what he meant: <a href="https:&#x2F;&#x2F;www.theatlantic.com&#x2F;technology&#x2F;archive&#x2F;2012&#x2F;11&#x2F;noam-chomsky-on-where-artificial-intelligence-went-wrong&#x2F;261637&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.theatlantic.com&#x2F;technology&#x2F;archive&#x2F;2012&#x2F;11&#x2F;noam-...</a><p>Some video clips of this interview by Yarden Katz: <a href="http:&#x2F;&#x2F;yarden.github.io&#x2F;pages&#x2F;chomsky&#x2F;" rel="nofollow">http:&#x2F;&#x2F;yarden.github.io&#x2F;pages&#x2F;chomsky&#x2F;</a>
eirikma大约 7 年前
This is a fantastic challenge for computer engineers: create an interpreter or compiler for a programming language, based on statistic modeling. How likely is it that combining the token &quot;foo&quot; with &quot;bar&quot; using a mathematical operator is a valid construct, based on experience from other programs? What will the resulting value usually be, based on prior experience?
wglb大约 7 年前
I am also reminded of the unreasonable effectiveness of data article:<a href="https:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.com&#x2F;en&#x2F;&#x2F;pubs&#x2F;archive&#x2F;35179.pdf" rel="nofollow">https:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.c...</a>
melling大约 7 年前
What’s the difference between a probabilistic model and a statistical model?
评论 #16490272 未加载
dpf大约 7 年前
Previous discussion:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=11951444" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=11951444</a> (2016)<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=2591154" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=2591154</a> (2011)
ronilan大约 7 年前
They both must rattle, an Epic Rap Battle.