TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Best Text Mining Resources

59 pointsby big_dataalmost 15 years ago
I plan a deeper dive into text mining this year, and am looking for some suggestions on what resources are best. A friend suggested Text Mining by Weiss, et al http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-0-387-95433-2<p>What would you suggest?

6 comments

helwralmost 15 years ago
Managing Gigabytes (Witten)<p>Information Retrieval (Manning)<p>Text Compression (Bell)<p>Natural Language Processing (Manning)<p>Natural Language Understanding (Allen)<p>Speech and Language Processing (Jurafsky)<p>The Text Mining Handbook (Sanger)<p>Statistical Machine Translation (Koehn)<p>Data-Intensive Text Processing with MapReduce (Lin)<p>Algorithms on strings (Gusfield)<p>Jewels of Stringology (Crochemore)<p>Regular Expressions (Friedl), also: <a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">http://swtch.com/~rsc/regexp/regexp1.html</a> and automata theory (Hopcroft)<p>Practical Text Mining with Perl (Bilisoly)<p>Natural Language Processing with Python (Bird)<p>Computational Linguistics (Hausser)<p>Syntactic structures (Chomsky)<p>also check out these links: <a href="http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html" rel="nofollow">http://measuringmeasures.blogspot.com/2010/01/learning-about...</a><p><a href="http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html" rel="nofollow">http://measuringmeasures.com/blog/2010/3/12/learning-about-m...</a><p><a href="http://www.cs.technion.ac.il/~gabr/resources/resources.html" rel="nofollow">http://www.cs.technion.ac.il/~gabr/resources/resources.html</a>
评论 #1428582 未加载
dejvalmost 15 years ago
If you want to learn more about learning over text I will recommend you to look at those lectures: <a href="http://videolectures.net/mlas06_pittsburgh/" rel="nofollow">http://videolectures.net/mlas06_pittsburgh/</a><p>First two lectures are great introduction to this topic and third is also related, but not necessary.<p>If you want to dive deeper to more advanced stuff I will recommend to look to the conditional random fields, which is kind of state of art of this field right now.<p>Great tutorial: <a href="http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf" rel="nofollow">http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf</a> Wiki entry: <a href="http://en.wikipedia.org/wiki/Conditional_random_field" rel="nofollow">http://en.wikipedia.org/wiki/Conditional_random_field</a>
mindcrimealmost 15 years ago
Tapping Into Unstructured Data: <a href="http://www.amazon.com/Tapping-into-Unstructured-Data-Intelligence/dp/0132360292" rel="nofollow">http://www.amazon.com/Tapping-into-Unstructured-Data-Intelli...</a><p>Mining The Talk: <a href="http://www.amazon.com/Mining-Talk-Unlocking-Unstructured-Information/dp/0132339536/ref=sr_1_1?ie=UTF8&#38;s=books&#38;qid=1276405985&#38;sr=1-1" rel="nofollow">http://www.amazon.com/Mining-Talk-Unlocking-Unstructured-Inf...</a><p>Text Mining Application Programming: <a href="http://www.amazon.com/Text-Mining-Application-Programming/dp/1584504609/ref=sr_1_1?ie=UTF8&#38;s=books&#38;qid=1276406016&#38;sr=1-1" rel="nofollow">http://www.amazon.com/Text-Mining-Application-Programming/dp...</a><p>Introduction to Information Retrieval (available freely online): <a href="http://nlp.stanford.edu/IR-book/information-retrieval-book.html" rel="nofollow">http://nlp.stanford.edu/IR-book/information-retrieval-book.h...</a>
kunjaanalmost 15 years ago
You could ask in the Machine Learning subreddit too : <a href="http://reddit.com/r/machinelearning" rel="nofollow">http://reddit.com/r/machinelearning</a>
varkalmost 15 years ago
- Modern Information Retrieval by Baeza Yates<p>- Data Mining Book by Jiawei Han et al<p>- Managing Gigabytes by Witten et al<p>- Hypertext Mining book by Chakrabarti
big_dataalmost 15 years ago
Great stuff here, sure to get me going in the right direction! Thank you all!