TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: TextBlob, Natural language processing made simple in Python

303 点作者 sloria将近 12 年前

19 条评论

eliben将近 12 年前
Yay #1: a nice wrapper around NLTK. NLTK is great but its API is not very Pythonic or comfortable. Pleasant facades over it are a great help for Python NLP.<p>Yay #2: an actually interesting programming-related article on HN. These get rarer every day, losing their place to gossips about what Snowden remarked following some or another NSA official&#x27;s remarks about Snowden&#x27;s even earlier remarks.
评论 #6192992 未加载
评论 #6193241 未加载
mrkmcknz将近 12 年前
Just a quick word on Pattern[1].<p>TextBlob is probably just using the en module, I would suggest everyone take a look at the other modules in particular the web module should you be doing any light data scraping. It has nice wrappers around BeautifulSoup and Scrapy among others, jumping into BeautifulSoup and Scrapy can be daunting for beginners.<p>[1] <a href="http://www.clips.ua.ac.be/pages/pattern" rel="nofollow">http:&#x2F;&#x2F;www.clips.ua.ac.be&#x2F;pages&#x2F;pattern</a>
eterm将近 12 年前
I&#x27;ve had good fun playing around with this, it&#x27;s certainly made NLP more approachable.<p>One issue though is that it seems to choke with certain characters.<p>For instance the character £ it seems to complain with this error message:<p>&gt;&gt;&gt; TextBlob(&quot;£&quot;) Traceback (most recent call last): File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt; File &quot;&#x2F;home&#x2F;eterm&#x2F;nlp&#x2F;local&#x2F;lib&#x2F;python2.7&#x2F;site-packages&#x2F;text&#x2F;blob.py&quot;, line 340, in __repr__ return unicode(&quot;{cls}(&#x27;{text}&#x27;)&quot;.format(cls=class_name, text=self.raw)) UnicodeDecodeError: &#x27;ascii&#x27; codec can&#x27;t decode byte 0xc2 in position 10: ordinal not in range(128)
评论 #6195309 未加载
feniv将近 12 年前
The NodeBox linguistics module is another nice wrapper around NLTK (and other natural language processing libraries). I used it for extracting actions and details from sentences, but it&#x27;s also great for spelling correction, pluralization, part-of-speech tagging and other common NLP tasks.<p><a href="http://nodebox.net/code/index.php/Linguistics" rel="nofollow">http:&#x2F;&#x2F;nodebox.net&#x2F;code&#x2F;index.php&#x2F;Linguistics</a>
评论 #6194113 未加载
Ihmahr将近 12 年前
Both for my study and side job I work on NLP with python.<p>Sorry, but I think this thing is very much overrated by the HN crowd. There are many such libraries and this one adds exactly nothing. I also don&#x27;t see how this is easier to use than, lets say, Pattern.<p>Try and add new functionality. One new functionality could be to use an ontology to calculate the distance between two words. Then you can do other cool things with that and place it in your module.
评论 #6195338 未加载
eieio将近 12 年前
This looks great! NLTK is incredible but definitely can be a bit intimidating. Very cool to have a wrapper around it.<p>I&#x27;m curious to see exactly how it works and so I&#x27;ll certainly check out the source when I have a bit more time. Thanks for posting this.
the_cat_kittles将近 12 年前
If you could add a blob.target and a default vectorizer, you could use scikits learn to offer some nice classification and regression. It&#x27;s pretty easy to do that with what you have now, but some of those concepts are a little foreign if you haven&#x27;t done text classification before, like me before yesterday. Particularly the part of speech tagging- using those as features could be powerful alongside n-grams.
shirkey将近 12 年前
For the Google Translate functionality, does this pass the request through an intermediary service or direct to the API?
mattdeboard将近 12 年前
So after poking around with this for a bit, I will say that it DEFINITELY is vulnerable to Python2&#x27;s string handling warts. Constructing a `TextBlob` out of a string with non-ASCII characters doesn&#x27;t seem to work. I created another virtualenv with Python 3 and it works quite well.
mark_l_watson将近 12 年前
I played with this a few days ago. It is a nice wrapper for NLTK. You probably want to, at some point, read the free NLTK book online.<p>Edit: and it also uses pattern.
sixQuarks将近 12 年前
Can someone explain what this does in layman&#x27;s terms? I&#x27;m a biz guy, not a coder, but I&#x27;m interested in the use cases. thanks
评论 #6192820 未加载
评论 #6192838 未加载
throwawayg99将近 12 年前
This is awesome. I looked, but couldn&#x27;t find out: is there a word sense disambiguation layer somewhere hidden in here?
sumit_psp将近 12 年前
Curious as to what training algorithms you used for Sentiment Analysis? Also can I add my domain specific training set?
评论 #6193339 未加载
dpmehta02将近 12 年前
This looks great, thanks for sharing.<p>Any thoughts or relevant benchmarks you would like to share about its speed?
tomrod将近 12 年前
Awesome! Thanks for posting. Are you the hacker that put it together?
gpsarakis将近 12 年前
Seems to have an incredibly easy interface. Will test it. Well done!
aswanson将近 12 年前
Thanks, I plan on using this.
photorized将近 12 年前
Awesome. I could use this.
misiti3780将近 12 年前
This looks great