TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

More data usually beats better algorithms

64 点作者 toffer大约 17 年前

5 条评论

gaika大约 17 年前
Not true - from current leaders of netflix prize:<p>Our experiments clearly show that once you have strong CF models, such extra data is redundant and cannot improve accuracy on the Netflix dataset.<p><a href="http://glinden.blogspot.com/2008/03/using-imdb-data-for-netflix-prize.html" rel="nofollow">http://glinden.blogspot.com/2008/03/using-imdb-data-for-netf...</a>
评论 #150830 未加载
michaelneale大约 17 年前
Well - thats what google keep saying (specifically, I think Peter Norvig has said that, I think, over and over - that he hasn't had access to as much data before and its fascinating to him).<p>Ah Peter Norvig, responsible for hours of my time wiled away on his web site with all sorts of knowledge porn.
评论 #150849 未加载
jsomers大约 17 年前
"Team B used a very simple algorithm, but they added in additional data beyond the Netflix set: information about movie genres from the Internet Movie Database (IMDB)."<p>I had no idea imdb had so much genre data. E.g., a "keywords" page for every movie [<a href="http://www.imdb.com/title/tt0062622/keywords" rel="nofollow">http://www.imdb.com/title/tt0062622/keywords</a>] and, for every keyword, maps of (a) related keywords and (b) movies that mention it [c.f., <a href="http://www.imdb.com/keyword/metaphysical/" rel="nofollow">http://www.imdb.com/keyword/metaphysical/</a>].<p>Very cool.
stcredzero大约 17 年前
More like bad/insufficient data defeats even good algorithms. When we recommend movies to friends, we are often using very different and more useful information than what's in the Netflix database.
csmajorfive大约 17 年前
I did the same thing last semester inspired by a class at Cornell. We came up with a very, very simple graph-based algorithm that gets above the competition's baseline (not quite bellkor level but there's lots of tweaking left to be done).<p>Now -- I was under the impression that using extra proprietary data (like imdb) is beyond the bounds of the competition. Can anyone shed some light on this? Maybe I should pick up the project again!
评论 #151679 未加载
评论 #151529 未加载