I plan a deeper dive into text mining this year, and am looking for some suggestions on what resources are best. A friend suggested Text Mining by Weiss, et al http://www.springer.com/computer/database+management+%26+information+retrieval/book/978-0-387-95433-2<p>What would you suggest?
Managing Gigabytes (Witten)<p>Information Retrieval (Manning)<p>Text Compression (Bell)<p>Natural Language Processing (Manning)<p>Natural Language Understanding (Allen)<p>Speech and Language Processing (Jurafsky)<p>The Text Mining Handbook (Sanger)<p>Statistical Machine Translation (Koehn)<p>Data-Intensive Text Processing with MapReduce (Lin)<p>Algorithms on strings (Gusfield)<p>Jewels of Stringology (Crochemore)<p>Regular Expressions (Friedl),
also: <a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">http://swtch.com/~rsc/regexp/regexp1.html</a>
and automata theory (Hopcroft)<p>Practical Text Mining with Perl (Bilisoly)<p>Natural Language Processing with Python (Bird)<p>Computational Linguistics (Hausser)<p>Syntactic structures (Chomsky)<p>also check out these links: <a href="http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html" rel="nofollow">http://measuringmeasures.blogspot.com/2010/01/learning-about...</a><p><a href="http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html" rel="nofollow">http://measuringmeasures.com/blog/2010/3/12/learning-about-m...</a><p><a href="http://www.cs.technion.ac.il/~gabr/resources/resources.html" rel="nofollow">http://www.cs.technion.ac.il/~gabr/resources/resources.html</a>
If you want to learn more about learning over text I will recommend you to look at those lectures: <a href="http://videolectures.net/mlas06_pittsburgh/" rel="nofollow">http://videolectures.net/mlas06_pittsburgh/</a><p>First two lectures are great introduction to this topic and third is also related, but not necessary.<p>If you want to dive deeper to more advanced stuff I will recommend to look to the conditional random fields, which is kind of state of art of this field right now.<p>Great tutorial: <a href="http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf" rel="nofollow">http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf</a>
Wiki entry: <a href="http://en.wikipedia.org/wiki/Conditional_random_field" rel="nofollow">http://en.wikipedia.org/wiki/Conditional_random_field</a>
You could ask in the Machine Learning subreddit too : <a href="http://reddit.com/r/machinelearning" rel="nofollow">http://reddit.com/r/machinelearning</a>
- Modern Information Retrieval by Baeza Yates<p>- Data Mining Book by Jiawei Han et al<p>- Managing Gigabytes by Witten et al<p>- Hypertext Mining book by Chakrabarti