TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Simple tf-idf in 30 lines of Idiomatic Clojure

23 pointsby ithayeralmost 14 years ago

2 comments

unwindalmost 14 years ago
Apparently, everyone knows that tf-idf stands for "term frequency-inverse document frequency". I had no idea, and the article didn't have time to include a link to <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf" rel="nofollow">http://en.wikipedia.org/wiki/Tf%E2%80%93idf</a> or even type out the acronym.
mduerksenalmost 14 years ago
Two remarks:<p>1. Don't 'earmuff' your stopwords, since you don't intend them to be rebound. An according guideline can be found here: <a href="http://dev.clojure.org/display/design/Library+Coding+Standards" rel="nofollow">http://dev.clojure.org/display/design/Library+Coding+Standar...</a><p>2. You could replace <i>(remove nil? (map db (tokenize raw-text)))</i> with <i>(keep db (tokenize raw-text))</i>
评论 #2706061 未加载