TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What are some good open-source language/textual analysis tools?

5 pointsby CoreSetover 10 years ago
I&#x27;m looking to do linguistic&#x2F;textual analysis on a large amount of text I&#x27;ve scraped for a research project, finding stats like: frequently used words, associated topic clusters, gender estimations.<p>I wrote the scraper myself, but the language analysis is something it seems it&#x27;d be easier to find OS and use out of the box or slightly modified.<p>Anyone have any ideas&#x2F;leads? Preference is for a script or process I can run from the CL to output the vitals.

4 comments

whitej125over 10 years ago
If you are a Python person (very popular language in the data sciences realm these days). Your gateway drug to linguistic and textual analysis is going to be NLTK.<p><a href="http://www.nltk.org/" rel="nofollow">http:&#x2F;&#x2F;www.nltk.org&#x2F;</a><p>The free book and tutorials are great and you can get up and running pretty quickly.<p>NLTK&#x27;s lower learning curve is great for getting your head around NLP concepts. Once you start looking for increased function or performance... you&#x27;ll find yourself graduating to a SciKit-Learn (<a href="http://scikit-learn.org/stable/" rel="nofollow">http:&#x2F;&#x2F;scikit-learn.org&#x2F;stable&#x2F;</a>).<p>In the Java world... I think Mahout is&#x2F;was popular. Quite a bit more setup to get through in order get this up and running.
评论 #9050375 未加载
manidoraisamyover 10 years ago
Stanford NLP is pretty good, if you are on java - <a href="http://nlp.stanford.edu/software/corenlp.shtml" rel="nofollow">http:&#x2F;&#x2F;nlp.stanford.edu&#x2F;software&#x2F;corenlp.shtml</a><p>You might also want to look at word2vec (implemented in most of the popular languages) - <a href="https://code.google.com/p/word2vec/" rel="nofollow">https:&#x2F;&#x2F;code.google.com&#x2F;p&#x2F;word2vec&#x2F;</a>
wallflowerover 10 years ago
This seems to have some good starting pointers:<p><a href="http://blog.datadive.net/which-topics-get-the-upvote-on-hacker-news/" rel="nofollow">http:&#x2F;&#x2F;blog.datadive.net&#x2F;which-topics-get-the-upvote-on-hack...</a>
评论 #9050383 未加载
biomimicover 10 years ago
This text summarizer will be open sourced soon: <a href="http://genopharmix.com/TuataraSum" rel="nofollow">http:&#x2F;&#x2F;genopharmix.com&#x2F;TuataraSum</a>