TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How Can I Get into NLP (Natural Language Processing)?

297 pointsby aarohmankadover 8 years ago
I&#x27;ve recently become quite intrigued by the concept, and want to learn more about it.<p>If you have any resources, I&#x27;d love to see them. They can be videos, articles, tutorials, courses, etc.<p>If there is any prerequisite knowledge required, which I assume there will be, I would also love a starting point. As for my background, I have experience in full stack web development, game development, and about a year&#x27;s worth of academic computer science study.

33 comments

gsingersover 8 years ago
My co-authors and I wrote &quot;Taming Text&quot; (<a href="https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;taming-text" rel="nofollow">https:&#x2F;&#x2F;www.manning.com&#x2F;books&#x2F;taming-text</a>) specifically for programmers (there is little math, mostly code) interested in getting started in NLP. The examples are a bit dated at this point (2013 publication date), but still applicable for someone getting started. Covers getting started, feature extraction and preprocessing, search, clustering, classification, string heuristics, Named Entity Recognition and finishes off w&#x2F; a simple Question Answering system. Examples are in Java. It is not an academic treatise.
评论 #12919801 未加载
erniedeferiaover 8 years ago
I have found these sources useful for learning and prototyping NLP:<p><a href="http:&#x2F;&#x2F;garysieling.com&#x2F;blog&#x2F;entity-recognition-with-scala-and-stanford-nlp-named-entity-recognizer" rel="nofollow">http:&#x2F;&#x2F;garysieling.com&#x2F;blog&#x2F;entity-recognition-with-scala-an...</a><p><a href="http:&#x2F;&#x2F;tika.apache.org" rel="nofollow">http:&#x2F;&#x2F;tika.apache.org</a><p>NLTK is always a good starting point: <a href="http:&#x2F;&#x2F;www.nltk.org" rel="nofollow">http:&#x2F;&#x2F;www.nltk.org</a><p>I also wrote a 3-part article leveraging OpenNLP with Clojure:<p><a href="http:&#x2F;&#x2F;edeferia.blogspot.com&#x2F;2015&#x2F;03&#x2F;from-natural-language-to-calendar.html" rel="nofollow">http:&#x2F;&#x2F;edeferia.blogspot.com&#x2F;2015&#x2F;03&#x2F;from-natural-language-t...</a><p>If you&#x27;re interesting in applying NLP without necessarily having theoretical background, wit.ai offers some really impressive features.<p>Course also offers a good course:<p><a href="https:&#x2F;&#x2F;www.coursera.org&#x2F;learn&#x2F;natural-language-processing" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;learn&#x2F;natural-language-processing</a>
评论 #12919556 未加载
theCricketerover 8 years ago
There is a great set of lectures by Dan Jurafsky and Chris Manning: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68&amp;list=PL6397E4B26D00A269" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=nfoudtpBV68&amp;list=PL6397E4B26...</a><p>It would be helpful to have some background in Machine Learning. For a good introductory course with a mix of mathematical background, see <a href="https:&#x2F;&#x2F;see.stanford.edu&#x2F;Course&#x2F;CS229" rel="nofollow">https:&#x2F;&#x2F;see.stanford.edu&#x2F;Course&#x2F;CS229</a><p>NLP in the more modern systems is backed by deep neural nets. Here&#x27;s a course on NLP using deep learning: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLIiVRB6G_w0i-uOoS6cDh_5nkUyxy_hxe" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLIiVRB6G_w0i-uOoS6cDh...</a>
评论 #12918608 未加载
评论 #12916971 未加载
deepaksurtiover 8 years ago
For initial learning, I would second NLTK with: <a href="http:&#x2F;&#x2F;www.nltk.org" rel="nofollow">http:&#x2F;&#x2F;www.nltk.org</a><p>You can also checkout <a href="https:&#x2F;&#x2F;github.com&#x2F;vseloved&#x2F;cl-nlp" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;vseloved&#x2F;cl-nlp</a>. It is an NLP toolkit in Common Lisp. Vsevolod the project owner is a great guy to work with. I had contributed with some minor bug fixes, tests, documentation more than a year back, hence the mention of Vsevolod.<p>You could also think on the alternative lines of contributing to an open source project in NLP and building an application on top of it. Talking to any such project owner for expected sample apps might help, as they can go into that project gallery and you get to level up your skills. Hope this helps.
评论 #12918256 未加载
smcameronover 8 years ago
You&#x27;re probably looking for something a bit more sophisticated than what I&#x27;m about to mention, but if you don&#x27;t need anything too sophisticated (that is, if you can significantly limit the domain of the speech you need to be able to understand), you could do something like what I did for &quot;the computer&quot; on my star trek-like space sim Space Nerds In Space: <a href="http:&#x2F;&#x2F;hackaday.com&#x2F;2016&#x2F;06&#x2F;08&#x2F;talking-star-trek&#x2F;" rel="nofollow">http:&#x2F;&#x2F;hackaday.com&#x2F;2016&#x2F;06&#x2F;08&#x2F;talking-star-trek&#x2F;</a><p>I used pocketsphinx (trained with specially limited vocab) for speech to text, my own home grown Zork-esque parser for &quot;understanding&quot; the text and generating responses, and pico2wav for text to speech for the responses. That&#x27;s described in a bit more detail here: <a href="https:&#x2F;&#x2F;scaryreasoner.wordpress.com&#x2F;2016&#x2F;05&#x2F;14&#x2F;speech-recognition-and-natural-language-processing-in-space-nerds-in-space&#x2F;" rel="nofollow">https:&#x2F;&#x2F;scaryreasoner.wordpress.com&#x2F;2016&#x2F;05&#x2F;14&#x2F;speech-recogn...</a>
dksidanaover 8 years ago
<a href="https:&#x2F;&#x2F;spacy.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;spacy.io&#x2F;</a> is one of the best library for NLP if you are using python
评论 #12918703 未加载
sandiusover 8 years ago
NLP is a huge topic, and the choice of materials pretty much depends on what you&#x27;d like to focus on. In my experience nothing beats a good textbook, especially if you do the exercises.<p>The classic NLP textbook is<p>* Jurafsky, Martin: &quot;Speech and Language Processing&quot; (<a href="https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;" rel="nofollow">https:&#x2F;&#x2F;web.stanford.edu&#x2F;~jurafsky&#x2F;slp3&#x2F;</a>) -- already mentioned here: a very solid overview textbook to give you an idea about the field;<p>Should you be interested in statistical NLP (even if it probably isn&#x27;t as sexy as it used to be), the classic there is:<p>* Manning, Schütze: &quot;Foundations of Statistical Natural Language Processing&quot; (<a href="http:&#x2F;&#x2F;nlp.stanford.edu&#x2F;fsnlp&#x2F;" rel="nofollow">http:&#x2F;&#x2F;nlp.stanford.edu&#x2F;fsnlp&#x2F;</a>).
lovelearningover 8 years ago
My recommendations, based on online courses and YouTube playlists I&#x27;ve taken:<p>- Coursera&#x27;s old NLP course by Michael Collins, Columbia Univ. More of theory and concepts. It&#x27;s discontinued now on coursera but the material is available at academictorrents. [1]<p>- NLP with Python and NLTK videos by sentdex [2]. Mostly programming, but with useful nuggets of concepts introduced here and there.<p>[1]: <a href="http:&#x2F;&#x2F;academictorrents.com&#x2F;details&#x2F;f99e7184fca947ee8f77901679e171fcadbf82e7" rel="nofollow">http:&#x2F;&#x2F;academictorrents.com&#x2F;details&#x2F;f99e7184fca947ee8f779016...</a><p>[2]: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLQVvvaa0QuDf2JswnfiGk...</a>
mrborgenover 8 years ago
I did a one week ml stunt last year: <a href="https:&#x2F;&#x2F;medium.com&#x2F;learning-new-stuff&#x2F;machine-learning-in-a-week-a0da25d59850#.y8pe9o9qm" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;learning-new-stuff&#x2F;machine-learning-in-a-...</a><p>I&#x27;d recommend starting with the Kaggle Bag of Words tutorial.
languagehackerover 8 years ago
Take a look at Stanford CoreNLP: <a href="http:&#x2F;&#x2F;stanfordnlp.github.io&#x2F;CoreNLP&#x2F;" rel="nofollow">http:&#x2F;&#x2F;stanfordnlp.github.io&#x2F;CoreNLP&#x2F;</a><p>It&#x27;s relatively fast (after model load time) and quite feature-rich.
评论 #12919088 未加载
andrewtbhamover 8 years ago
If you&#x27;re interested in deep learning for nlp... I suggest at least some familiarity with these papers. It sorta depends on what task you want to use it for.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;andrewt3000&#x2F;dl4nlp" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;andrewt3000&#x2F;dl4nlp</a>
denzil_correaover 8 years ago
Please read through the Handbook of NLP for a nice overview.<p><a href="https:&#x2F;&#x2F;karczmarczuk.users.greyc.fr&#x2F;TEACH&#x2F;TAL&#x2F;Doc&#x2F;Handbook%20Of%20Natural%20Language%20Processing,%20Second%20Edition%20Chapman%20&amp;%20Hall%20Crc%20Machine%20Learning%20&amp;%20Pattern%20Recognition%202010.pdf" rel="nofollow">https:&#x2F;&#x2F;karczmarczuk.users.greyc.fr&#x2F;TEACH&#x2F;TAL&#x2F;Doc&#x2F;Handbook%2...</a>
norswapover 8 years ago
I have no particular expertise on the topic, but just in case you missed it, there is this Quora question: <a href="https:&#x2F;&#x2F;www.quora.com&#x2F;How-do-I-learn-Natural-Language-Processing" rel="nofollow">https:&#x2F;&#x2F;www.quora.com&#x2F;How-do-I-learn-Natural-Language-Proces...</a><p>It points to NLTK as the framework of choice, and has links to a couple MOOCs and tutorials.
sundarurfriendover 8 years ago
My suggestion is, in addition to using the videos and courses for background knowledge, to take up and work on a (non-homework) project, to truly explore the area.<p>For eg., Betty [1] is quite an interesting project with both real-life use and practical NLP considerations, and is looking for new maintainers. (I&#x27;m not affiliated, just interested in NLP myself and have been itching to get into betty for some time.)<p>If you like thinking about game design, there&#x27;s also the option of Interactive Fiction [2], NLP-involving ones are called parser-based fictions I believe. A recent FLOSS podcast episode with folks from the IF Tech Foundation was pretty interesting and illuminating regarding this area.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;pickhardt&#x2F;betty" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;pickhardt&#x2F;betty</a> [2] <a href="http:&#x2F;&#x2F;iftechfoundation.org&#x2F;frequently-asked-questions&#x2F;" rel="nofollow">http:&#x2F;&#x2F;iftechfoundation.org&#x2F;frequently-asked-questions&#x2F;</a>
du_bingover 8 years ago
Hi, some tools seem work fine with English, so is there any good NLP tool for Chinese? Hope for some advice, thanks ahead.
评论 #12921748 未加载
probinsoover 8 years ago
Start by finding a linguist. You can find one at your local university.<p>Let the linguist design your first project. It should be something that they don&#x27;t know how to solve, but have wanted to know.<p>Don&#x27;t worry about if it is feasible. Go to local data meetups when you have enough exposure to form your first questions.
carljohanover 8 years ago
Jurafaki and Martins Natural language processing is a great book covering a great deal pf topics in nlp.
joelhooksover 8 years ago
We&#x27;ve just started adding lessons on this topic on egghead.io [0]<p>[0] <a href="https:&#x2F;&#x2F;egghead.io&#x2F;lessons&#x2F;node-js-break-up-language-strings-into-parts-using-natural" rel="nofollow">https:&#x2F;&#x2F;egghead.io&#x2F;lessons&#x2F;node-js-break-up-language-strings...</a>
garysielingover 8 years ago
Do you want to use NLP in a project, or to dig into the state of the art?<p>The NLTK approach may be dated, but it is easier to approach as an engineer, especially if this is a hobby. It will give you a good introduction to problems in the space.<p>The math heavy approaches may give better results long-term, but it will be a much longer time commitment, but this is probably more appropriate if you&#x27;re trying to find a job.<p>You can also do interesting things with a small dataset and the free plans of APIs like Watson. E.g., I&#x27;m working on a search engine for standalone lectures - <a href="https:&#x2F;&#x2F;www.findlectures.com" rel="nofollow">https:&#x2F;&#x2F;www.findlectures.com</a>.
elorantover 8 years ago
I would suggest you start with “An introduction for information retrieval”. You can find a free version here:<p><a href="http:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;" rel="nofollow">http:&#x2F;&#x2F;nlp.stanford.edu&#x2F;IR-book&#x2F;</a>
dukakisxyzover 8 years ago
Check out this curated list of resources dedicated to Natural Language Processing on GitHub: <a href="https:&#x2F;&#x2F;github.com&#x2F;keonkim&#x2F;awesome-nlp" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;keonkim&#x2F;awesome-nlp</a>. Also this is a great blog for understanding the business and high level aspects of the technology: <a href="https:&#x2F;&#x2F;lekta.ai&#x2F;blog&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lekta.ai&#x2F;blog&#x2F;</a>
noahshpakover 8 years ago
I got into NLP through Chris Callison-Burch&#x27;s class at the University of Pennsylvania (<a href="http:&#x2F;&#x2F;mt-class.org&#x2F;penn&#x2F;" rel="nofollow">http:&#x2F;&#x2F;mt-class.org&#x2F;penn&#x2F;</a>). Great meta resource for intro readings, background, and advanced methods.<p>This is the textbook for the course: <a href="http:&#x2F;&#x2F;www.statmt.org&#x2F;book&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.statmt.org&#x2F;book&#x2F;</a>
totalperspectivover 8 years ago
Has anyone read Language Processing in Perl and Prolog and have thoughts on it? I&#x27;m looking g for something that goes deep on theory, but has good code examples, and is preferably a book.<p><a href="https:&#x2F;&#x2F;www.amazon.com&#x2F;gp&#x2F;aw&#x2F;d&#x2F;364241463X&#x2F;ref=dp_ob_neva_mobile" rel="nofollow">https:&#x2F;&#x2F;www.amazon.com&#x2F;gp&#x2F;aw&#x2F;d&#x2F;364241463X&#x2F;ref=dp_ob_neva_mob...</a>
stassover 8 years ago
Prolog and Natural-Language Analysis[1] is great from both theoretical and practical standpoints.<p>[1] <a href="http:&#x2F;&#x2F;www.mtome.com&#x2F;Publications&#x2F;PNLA&#x2F;prolog-digital.pdf" rel="nofollow">http:&#x2F;&#x2F;www.mtome.com&#x2F;Publications&#x2F;PNLA&#x2F;prolog-digital.pdf</a>
JSeymourATLover 8 years ago
Build up personal &amp; professional contacts. Check out this group -- ACM Special Interest Group on Artificial Intelligence &gt; <a href="https:&#x2F;&#x2F;sigai.acm.org&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;sigai.acm.org&#x2F;index.html</a>
shanwangover 8 years ago
I&#x27;m going through the stand ford cs224D videos, only done 3 videos and they are very theory focused, lots of math equations. Any one know other good materials on NLP using neural networks?
felix_thursdayover 8 years ago
here&#x27;s a pretty comprehensive overview of NLP videos, tutorials, courses, books, etc. <a href="http:&#x2F;&#x2F;blog.algorithmia.com&#x2F;introduction-natural-language-processing-nlp&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.algorithmia.com&#x2F;introduction-natural-language-pr...</a>
probinsoover 8 years ago
start a project with someone. write your own data scraper, and implement a model.
kylebgormanover 8 years ago
I would <i>not</i> recommend NLTK (or its book) or Jurafsky &amp; Martin, or Manning &amp; Schuetze. All are insanely dated. Watch some Coursera lectures, check out a newer, non-academic, application-oriented text, or just build something.
lifeisstillgoodover 8 years ago
to the mods: vagabondjack&#x27;s comment seems sensible, informative and well thought out but seems to have been de-duped in error.<p>Any chance of raising it out of grey-text territory?
hiouover 8 years ago
NLTK[0][1] (Natural Language Toolkit) was fantastic as an initial resource for me. Because it&#x27;s a self contained book and library, I found it to have a very smooth learning curve. There is some introductory programming stuff that you can very easily just skip in the beginning so don&#x27;t let that turn you off initially.<p>[0] <a href="http:&#x2F;&#x2F;nltk.org" rel="nofollow">http:&#x2F;&#x2F;nltk.org</a> [1] <a href="http:&#x2F;&#x2F;nltk.org&#x2F;book" rel="nofollow">http:&#x2F;&#x2F;nltk.org&#x2F;book</a>
joesmoover 8 years ago
Check out Stanford&#x27;s NLP libraries. We&#x27;ve been using those in production for years now. The documentation around it is not great, but the tools work well.
edblarneyover 8 years ago
Watch the videos made by Jurafsky (Stanford) as a starting point.<p>They are quick. This will give you an overview of classical NLP.<p>From there, you can dig more where you want.