TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What Happened to Old School NLP?

91 点作者 psygnisfive超过 10 年前

9 条评论

danieldk超过 10 年前
<i>We&#x27;ll start to see the re-emergence of tools from old-school NLP, but now augmented with the powerful statistical tools and data-oriented automation of new-school NLP. IBM&#x27;s Watson already does this to some extent.</i><p>This is not a new trend. As early as 1997, Steven Abney augmented [1] attribute-value grammars with discriminative modelling (maximum entropy models) in this case to form &#x27;stochastic attribute-value grammars&#x27;. There is a lot of work on efficiently extracting the best parse from packed forests, etc. Most systems that rely on unification grammars (e.g. HPSG grammars) already use stochastic models.<p>In the early to mid 2000s when the modelling of association strengths using structured or unstructured text became popular, old-school parsers have been adopting such techniques to learn selectional preferences that cannot be learnt from the usually small hand-annotated treebanks. E.g. in languages that normally have SVO (subject-verb-object) for main clauses but also permit OVS order, parsers trained on small hand-annotated treebanks would often be set on the wrong path when the direct object is fronted (analyzing the direct object as subject). Techniques from association strength modelling were used to learn selectional preferences such as &#x27;bread is usually the subject of eat&#x27; from automatically annotated text [2].<p>In recent years, learning word vector representations using neural networks has become popular. Again, not surprisingly, people have been integrating vectors as features in the disambiguation components of old-school NLP parsers. In some cases with great success.<p>tl;dr, the flow of ideas and tools from new-school NLP to old-school NLP has been going on ever since the statistical NLP revolution started.<p>[1] <a href="http://ucrel.lancs.ac.uk/acl/J/J97/J97-4005.pdf" rel="nofollow">http:&#x2F;&#x2F;ucrel.lancs.ac.uk&#x2F;acl&#x2F;J&#x2F;J97&#x2F;J97-4005.pdf</a><p>[2] <a href="http://www.let.rug.nl/vannoord/papers/iwptbook.pdf" rel="nofollow">http:&#x2F;&#x2F;www.let.rug.nl&#x2F;vannoord&#x2F;papers&#x2F;iwptbook.pdf</a>
评论 #8873271 未加载
评论 #8873013 未加载
agentile超过 10 年前
Reading this article, particularly the part about sentiment analysis, was interesting to me because last year I did my thesis[1] regarding sentiment classification using a somewhat mixed approach (albeit pretty simple) where I factored in basic sentence structure in addition to word features to see improvement in accuracies. I found it really neat to see various cases where particular sentence structures like PRP RB VB DT NN would be much more likely to show up for a positive sentiment e.g. &quot;I highly recommend this product&quot; vs negative sentiment e.g. &quot;They totally misrepresent this product&quot;<p>I get the impression that while it is true the computational side of computational linguistics has seemingly seen more attention for lucrative reasons, but now it is seeing some success there more people trying to incorporate more from the linguistic side, when it doesn&#x27;t cause for a huge amount of computational expense.<p>It doesn&#x27;t seem like anything new, however, that business needs drive funding for particular areas in academia. Sadly, more so than ever considering the greed of the school systems (but that is another topic).<p>[1] <a href="https://digital.lib.washington.edu/researchworks/handle/1773/24983" rel="nofollow">https:&#x2F;&#x2F;digital.lib.washington.edu&#x2F;researchworks&#x2F;handle&#x2F;1773...</a>
评论 #8873714 未加载
dsfsdfd超过 10 年前
I think it&#x27;s more likely that we will do machine learning to learn the syntactic structure, rather than hand craft these pieces of machinery. For a long time we have tried to create intelligent machines by designing to solve a problem, now at last we are designing machines to solve problems and finding success. I see no reason to imagine that going back to the old school methods, with a layer of the new magic on top is going to be effective as we move to the medium term - in the short term, possibly, but briefly.
评论 #8873152 未加载
JacobiX超过 10 年前
The beginning of the article reminds me of the quote : &quot;Every time I fire a linguist, the performance of our speech recognition system goes up.&quot; But nowadays statistical NLP systems regularly use syntactic and semantic information as a features in the learning phase.
评论 #8873635 未加载
lazzlazzlazz超过 10 年前
NetBase (<a href="http://www.netbase.com" rel="nofollow">http:&#x2F;&#x2F;www.netbase.com</a>) uses this kind of &quot;old-school&quot; NLP (with a large team of full-time linguistics PhDs) augmented with statistical tools and increasingly sophisticated forms of automation.<p>The end product is more accurate and quicker to adapt than the industry is used to.<p>Disclosure: I work in the engineering team at NetBase.
评论 #8873042 未加载
评论 #8872886 未加载
评论 #8872862 未加载
ryanmim超过 10 年前
This is a pretty good explanation of why almost all practical applications of NLP are now accomplished by statistics rather than fancy linguistic grammar models you might have read about in a Chomsky book.<p>Old school NLP has always fascinated me though, and I&#x27;m pretty excited about what might be possible in the future by using more than purely statistical methods for accomplishing NLP tasks. Maybe the author could have speculated more wildly in his prognostication ;)
评论 #8874160 未加载
评论 #8873657 未加载
VLM超过 10 年前
There is a greater economic lesson that tech does not necessarily have the drivers seat in the economy.<p>&quot;but with the advent of computers, it became possible to monetize NLP and the priority shifted to making products people would buy, rather than a system that was scientifically correct.&quot;<p>The competition for a NLP computer program is not another NLP computer program, but call centers in India, Phillipines, onshore prison labor, that kind of &quot;support&quot;
sdoering超过 10 年前
I really liked the article and some of the Blog-Headlines seemed to be interesting as well. But try as I might, I was not able to find a rss&#x2F;atom&#x2F;xml-Feed for plugging this ressource into my feedreader. Sadly, so I will probably miss upcoming interesting posts.
评论 #8874976 未加载
dschiptsov超过 10 年前
The subtle ideas form the original &quot;Structure Of Magic&quot; books about how we construct out internal representations of reality depending of wording we use has been replaced by industry of coaches and consultants.<p>The ideas, by the way, had been studied by mainstream psychology as the framing effect and the priming effect.<p>In short, our minds do lexical analysis and decomposition sub-consciously, so we could be influenced by specially crafted sentences. We also leak details of our internal representation of some aspects of reality in the way we unconsciously construct language sentences.
评论 #8873030 未加载
评论 #8873047 未加载