TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to compare two articles..

2 点作者 bdouglas将近 17 年前
hi...<p>trying to figure out what ways are there to compare/determine if two separate articles are the same...<p>curently researching semantic analysis, but figured i'd turn here as well...<p>thoughts/comments...<p>thanks<p>bd

3 条评论

raffi将近 17 年前
Hi Bd, Ironic, yesterday I uploaded a tech-demo of something I call kindling which attempts to correlate articles against news feeds from social websites.<p>I read a book called Collective Intelligence by Tony Segaran. Its basically machine learning for dummies, very example heavy, all in Python.<p>He talks about clustering to group like things together in an unsupervised way. The way this works is to build a vector of words from each article and compare these using something known as pearson distance. The vector of words is known as a feature set. Early on you create this vector in a naive way (i.e. eliminate words that don't show up enough and words that show up too much). At the end of the book he talks about feature detection (which I assume is building this vector in a smarter way).<p>The book really helped me. Pearson correlation is pretty easy to grasp and implement as well.<p>Good luck.
MaysonL将近 17 年前
There's a great Google tech talk on this subject:<p><a href="http://www.youtube.com/watch?v=AyzOUbkUf3M" rel="nofollow">http://www.youtube.com/watch?v=AyzOUbkUf3M</a>
jfarmer将近 17 年前
What do you mean "are the same?"
评论 #263463 未加载