TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A First Excercise in Natural Language Processing with Python: Counting Hapaxes

90 pointsby cristoperbover 7 years ago

5 comments

dec0dedab0deover 7 years ago
I get that the point is to be an introduction to the libraries and whatnot, but was I the only one who immediately thought of just using Counter?<p><pre><code> from collections import Counter import re [word for word, count in Counter(re.findall(&#x27;\w*&#x27;, text.lower())).items() if count == 1]</code></pre>
评论 #15206084 未加载
评论 #15205055 未加载
newman8rover 7 years ago
for anyone interested in more good beginner resources, I really enjoyed this youtube playlist on python NLTK <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=OGxgnH8y2NM&amp;list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=OGxgnH8y2NM&amp;list=PLQVvvaa0Qu...</a><p>edit* I accidentally linked to another good playlist, but here&#x27;s the first vid of the NLTK list from the same user <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FLZvOKSCkxY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FLZvOKSCkxY</a>
grabcocqueover 7 years ago
Hapax Legomenon is such a satisfying phrase to say. Even the opportunity to look at it makes my eyes happy.
visargaover 7 years ago
I counted word n-grams up to length 6 in a corpus of 6 billion words with Madoka, a Count-Min sketch algorithm.<p><a href="https:&#x2F;&#x2F;pypi.python.org&#x2F;pypi&#x2F;madoka" rel="nofollow">https:&#x2F;&#x2F;pypi.python.org&#x2F;pypi&#x2F;madoka</a>
cristoperbover 7 years ago
Author here. The misspelling in the title is embarrassing, but luckily not very noticeable (I&#x27;ve fixed it on the site).
评论 #15206003 未加载