I own the paper version of this - just a heads up, it's written primary for Language Scientists rather than developers. You may find yourself constantly pausing and looking up concepts and their meanings / practical uses on linguistics websites when returning to the book.<p>It's a great resource, but don't expect to get started quickly.
LinkGrammar: <a href="http://www.link.cs.cmu.edu/link/" rel="nofollow">http://www.link.cs.cmu.edu/link/</a> is the other NLP tool kit in python, may be suited for use by those who might be intimidated by the need to write their own grammar.<p>It is integrated into and maintained by the authors of Abiword. There is a talk on this other library at Pycon, this year: <a href="https://us.pycon.org/2012/schedule/presentation/187/" rel="nofollow">https://us.pycon.org/2012/schedule/presentation/187/</a>
If you're interested in natural language processing (NLP), but don't have a linguistics background, I would suggest reading Steven Pinker's The Language Instinct. It will introduce you to the necessary terminology and concepts for NLP in an easy-to-digest way. (The NLTK book has been free online for quite some time as well.)
The book and software are both a great resource. I usually "roll my own" NLP software, but I have used NLTK for small customer text mining tasks. Definitely "batteries included." I bought the book years ago, but now, the online edition may be more current.
Probably up there amongst the most useful Python libraries IMO. Hasn't it been available for free online for a long time now though?<p>Anyway, in case anyone reading this missed it, the Stanford NLP class taught by Chris Manning and Dan Jurafsky starting next week (Jan 23rd) will allow programming assignments to be submitted using Python and NLTK, which is really good news.<p>So now's a good time to get familiar with the NLTK, or for a refresher for those of us already acquainted with it.
We're using this library in class (Foundations of Language Technology) at the TU Darmstadt. From my point of view as a student, who hasn't done much Python in the past, it's pretty easy to use and works well for learning about NLP, hiding many implementation details and letting me focus on solving fairly complex tasks on data/algorithm levels.
There is a free NLTK cloud API: <a href="http://www.mashape.com/apis/Text-Processing" rel="nofollow">http://www.mashape.com/apis/Text-Processing</a><p>It includes sentiment analysis, stemming and lemmatization, part-of-speech tagging and chunking, phrase extraction and named entity recognition.
I came across <i>Natural Language Processing for the Working Programmer</i>[1] recently. It's released under a creative commons license (CC-BY). It's still a work in progress, but might be interesting anyhow.<p>[1]: <a href="http://nlpwp.org/book/" rel="nofollow">http://nlpwp.org/book/</a>
ipython+nltk+networkx+pytables+numpy+matplotlib=full of win<p>Seriously - text mining made fun and exploratory with open source tools.<p>One of the text sources I like to use is the Launchpad tickets for the Ubuntu project, since they get a good amount of traffic from international end users, a professional interest of mine.<p>It would be great to hear about some other interesting open data sets that people have found.