TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Guide to Natural Language Processing

418 点作者 ftomassetti超过 7 年前

14 条评论

bhaak超过 7 年前
&gt; Essentially, when dealing with natural languages hacking a solution is the suggested way of doing things, since nobody can figure out how to do it properly.<p>That&#x27;s really the TL;DR I also got from the computational linguistic courses I attended.<p>There&#x27;s probably the Pareto principle at works. Having no solution is worse than having an 80% solution that works well enough when the 100% solution is much harder to achieve (and some of the problems not even humans would be able to solve properly).
评论 #15705884 未加载
评论 #15706755 未加载
评论 #15704411 未加载
评论 #15704371 未加载
nl超过 7 年前
Ha, there&#x27;s a whole section on clones of the summarizer from Classifier4J.<p>I wrote that in 2003 (I think?) based on @pg&#x27;s &quot;A plan for spam&quot; essay, and then &quot;invented&quot; the summarization approach (I&#x27;m sure others had done similar, but I thought it up myself anyway).<p>Turns out it was rather well tuned. The 2003 implementation, presumably downloaded from sourceforge(!) still wins comparisons on datasets which didn&#x27;t even exist when I wrote it[1].<p>I much prefer the Python implementation though[2], which I hadn&#x27;t seen before.<p>Also, Textacy on top of Spacy is awesome for any kind of text work.<p>[1] <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;citation.cfm?id=2797081" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;citation.cfm?id=2797081</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;thavelick&#x2F;summarize&#x2F;blob&#x2F;master&#x2F;summarize.py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;thavelick&#x2F;summarize&#x2F;blob&#x2F;master&#x2F;summarize...</a>
amelius超过 7 年前
There are a few applications missing:<p>- Answering a question by returning a search result from a large body of texts. E.g. &quot;How do I change the background color of a page in Javascript?&quot;<p>- Improving the readability of a text. The article only mentions &quot;understanding how difficult to read is a text&quot;.<p>- Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like &quot;Burning coal increases CO2&quot;, and &quot;CO2 increase induces global warming&quot;. Useful also in medical literature where there are millions of pathways.<p>- Answering a question, using a large body of facts. Like search, but now it gives a precise answer.<p>- Finding and correcting spelling&#x2F;grammatical errors.
评论 #15704079 未加载
评论 #15703619 未加载
评论 #15703349 未加载
评论 #15703490 未加载
kinow超过 7 年前
A lot to review, read, learn. Thanks a lot for sharing this. Any plans to extend it or have another one including even more, like Natural Language Generation (not limited to bots, we are using it in weather forecast), and co-reference?
评论 #15703850 未加载
fnl超过 7 年前
I&#x27;m always astonished how little mention gensim gets, considering that it can basically be used for all the listed tasks, including parsing, if you combine it with your favorite deep learning library (DyNet, anyone?).
评论 #15704500 未加载
pencilcode超过 7 年前
Regarding finding similar documents what is the state of the art nowadays, LDA, word2vec, something else? What do you normally use?
评论 #15707447 未加载
评论 #15707616 未加载
评论 #15706537 未加载
visarga超过 7 年前
First time I see reading time and readability score mentioned together with NLP.
bane超过 7 年前
Was hoping for some discussion about word vectors like word2vec. I keep reading about them, but don&#x27;t really understand what they&#x27;re useful for.
评论 #15703749 未加载
评论 #15703798 未加载
评论 #15703743 未加载
评论 #15704999 未加载
d23超过 7 年前
My experience with your site on mobile: <a href="https:&#x2F;&#x2F;m.imgur.com&#x2F;5vLrEJH" rel="nofollow">https:&#x2F;&#x2F;m.imgur.com&#x2F;5vLrEJH</a><p>Can&#x27;t get it to go away, can&#x27;t read the article.
arcanus超过 7 年前
Is there an equivalent to MNIST for NLP? I&#x27;ve always wanted to play around in this space but I don&#x27;t know a good, and simple, database to start with.
评论 #15707723 未加载
评论 #15704378 未加载
评论 #15704304 未加载
评论 #15707643 未加载
betageek超过 7 年前
Your &#x27;send me a PDF&#x27; popup has the background fade div above the form so it&#x27;s impossible to fill in the form (without opening dev tools).
评论 #15703440 未加载
评论 #15703435 未加载
rpedela超过 7 年前
Using Chrome on both a Chromebook and Galaxy S5, the right sidebar is screwed up. On the phone, it completely blocks the content.
Boothroid超过 7 年前
Quite an obnoxious website on my phone. Anyway I came here to point to GATE as a mature FLOSS option: <a href="https:&#x2F;&#x2F;gate.ac.uk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gate.ac.uk&#x2F;</a>
alexasmyths超过 7 年前
Recommend Dan Jurafsky and Chris Manning @ Stanford online course:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68</a>