科技回声

3 条评论

raffi将近 17 年前

Hi Bd, Ironic, yesterday I uploaded a tech-demo of something I call kindling which attempts to correlate articles against news feeds from social websites.I read a book called Collective Intelligence by Tony Segaran. Its basically machine learning for dummies, very example heavy, all in Python.He talks about clustering to group like things together in an unsupervised way. The way this works is to build a vector of words from each article and compare these using something known as pearson distance. The vector of words is known as a feature set. Early on you create this vector in a naive way (i.e. eliminate words that don't show up enough and words that show up too much). At the end of the book he talks about feature detection (which I assume is building this vector in a smarter way).The book really helped me. Pearson correlation is pretty easy to grasp and implement as well.Good luck.

MaysonL将近 17 年前

There's a great Google tech talk on this subject:<a href="http://www.youtube.com/watch?v=AyzOUbkUf3M" rel="nofollow">http://www.youtube.com/watch?v=AyzOUbkUf3M</a>

jfarmer将近 17 年前

What do you mean "are the same?"

评论 #263463 未加载

How to compare two articles..

3 条评论

How to compare two articles..

3 条评论