TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A First Excercise in Natural Language Processing with Python: Counting Hapaxes

90 点作者 cristoperb超过 7 年前

5 条评论

dec0dedab0de超过 7 年前
I get that the point is to be an introduction to the libraries and whatnot, but was I the only one who immediately thought of just using Counter?<p><pre><code> from collections import Counter import re [word for word, count in Counter(re.findall(&#x27;\w*&#x27;, text.lower())).items() if count == 1]</code></pre>
评论 #15206084 未加载
评论 #15205055 未加载
newman8r超过 7 年前
for anyone interested in more good beginner resources, I really enjoyed this youtube playlist on python NLTK <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=OGxgnH8y2NM&amp;list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=OGxgnH8y2NM&amp;list=PLQVvvaa0Qu...</a><p>edit* I accidentally linked to another good playlist, but here&#x27;s the first vid of the NLTK list from the same user <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FLZvOKSCkxY" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FLZvOKSCkxY</a>
grabcocque超过 7 年前
Hapax Legomenon is such a satisfying phrase to say. Even the opportunity to look at it makes my eyes happy.
visarga超过 7 年前
I counted word n-grams up to length 6 in a corpus of 6 billion words with Madoka, a Count-Min sketch algorithm.<p><a href="https:&#x2F;&#x2F;pypi.python.org&#x2F;pypi&#x2F;madoka" rel="nofollow">https:&#x2F;&#x2F;pypi.python.org&#x2F;pypi&#x2F;madoka</a>
cristoperb超过 7 年前
Author here. The misspelling in the title is embarrassing, but luckily not very noticeable (I&#x27;ve fixed it on the site).
评论 #15206003 未加载