TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A Guide to Natural Language Processing

418 pointsby ftomassettiover 7 years ago

14 comments

bhaakover 7 years ago
&gt; Essentially, when dealing with natural languages hacking a solution is the suggested way of doing things, since nobody can figure out how to do it properly.<p>That&#x27;s really the TL;DR I also got from the computational linguistic courses I attended.<p>There&#x27;s probably the Pareto principle at works. Having no solution is worse than having an 80% solution that works well enough when the 100% solution is much harder to achieve (and some of the problems not even humans would be able to solve properly).
评论 #15705884 未加载
评论 #15706755 未加载
评论 #15704411 未加载
评论 #15704371 未加载
nlover 7 years ago
Ha, there&#x27;s a whole section on clones of the summarizer from Classifier4J.<p>I wrote that in 2003 (I think?) based on @pg&#x27;s &quot;A plan for spam&quot; essay, and then &quot;invented&quot; the summarization approach (I&#x27;m sure others had done similar, but I thought it up myself anyway).<p>Turns out it was rather well tuned. The 2003 implementation, presumably downloaded from sourceforge(!) still wins comparisons on datasets which didn&#x27;t even exist when I wrote it[1].<p>I much prefer the Python implementation though[2], which I hadn&#x27;t seen before.<p>Also, Textacy on top of Spacy is awesome for any kind of text work.<p>[1] <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;citation.cfm?id=2797081" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;citation.cfm?id=2797081</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;thavelick&#x2F;summarize&#x2F;blob&#x2F;master&#x2F;summarize.py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;thavelick&#x2F;summarize&#x2F;blob&#x2F;master&#x2F;summarize...</a>
ameliusover 7 years ago
There are a few applications missing:<p>- Answering a question by returning a search result from a large body of texts. E.g. &quot;How do I change the background color of a page in Javascript?&quot;<p>- Improving the readability of a text. The article only mentions &quot;understanding how difficult to read is a text&quot;.<p>- Establishing relationships between entities in a body of text. E.g. we could build a fact-graph from sentences like &quot;Burning coal increases CO2&quot;, and &quot;CO2 increase induces global warming&quot;. Useful also in medical literature where there are millions of pathways.<p>- Answering a question, using a large body of facts. Like search, but now it gives a precise answer.<p>- Finding and correcting spelling&#x2F;grammatical errors.
评论 #15704079 未加载
评论 #15703619 未加载
评论 #15703349 未加载
评论 #15703490 未加载
kinowover 7 years ago
A lot to review, read, learn. Thanks a lot for sharing this. Any plans to extend it or have another one including even more, like Natural Language Generation (not limited to bots, we are using it in weather forecast), and co-reference?
评论 #15703850 未加载
fnlover 7 years ago
I&#x27;m always astonished how little mention gensim gets, considering that it can basically be used for all the listed tasks, including parsing, if you combine it with your favorite deep learning library (DyNet, anyone?).
评论 #15704500 未加载
pencilcodeover 7 years ago
Regarding finding similar documents what is the state of the art nowadays, LDA, word2vec, something else? What do you normally use?
评论 #15707447 未加载
评论 #15707616 未加载
评论 #15706537 未加载
visargaover 7 years ago
First time I see reading time and readability score mentioned together with NLP.
baneover 7 years ago
Was hoping for some discussion about word vectors like word2vec. I keep reading about them, but don&#x27;t really understand what they&#x27;re useful for.
评论 #15703749 未加载
评论 #15703798 未加载
评论 #15703743 未加载
评论 #15704999 未加载
d23over 7 years ago
My experience with your site on mobile: <a href="https:&#x2F;&#x2F;m.imgur.com&#x2F;5vLrEJH" rel="nofollow">https:&#x2F;&#x2F;m.imgur.com&#x2F;5vLrEJH</a><p>Can&#x27;t get it to go away, can&#x27;t read the article.
arcanusover 7 years ago
Is there an equivalent to MNIST for NLP? I&#x27;ve always wanted to play around in this space but I don&#x27;t know a good, and simple, database to start with.
评论 #15707723 未加载
评论 #15704378 未加载
评论 #15704304 未加载
评论 #15707643 未加载
betageekover 7 years ago
Your &#x27;send me a PDF&#x27; popup has the background fade div above the form so it&#x27;s impossible to fill in the form (without opening dev tools).
评论 #15703440 未加载
评论 #15703435 未加载
rpedelaover 7 years ago
Using Chrome on both a Chromebook and Galaxy S5, the right sidebar is screwed up. On the phone, it completely blocks the content.
Boothroidover 7 years ago
Quite an obnoxious website on my phone. Anyway I came here to point to GATE as a mature FLOSS option: <a href="https:&#x2F;&#x2F;gate.ac.uk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gate.ac.uk&#x2F;</a>
alexasmythsover 7 years ago
Recommend Dan Jurafsky and Chris Manning @ Stanford online course:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68</a>