TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

My Python Code for the Netflix Prize

130 点作者 alexbw将近 13 年前
I competed alone in the Netflix Prize in college under the team name "Hi!". I've never seen anybody release their code, and I'm getting back into machine learning now, and realized that some folks might want to take a gander at a competitive machine learning codeset.<p>It's implemented mostly in Python, with Cython for the real speed-sensitive parts (everything in file "svd.pyx" did the heavy lifting, and got me up the leaderboard).<p>I hope that some folks will find this useful.

14 条评论

Nogwater将近 13 年前
Here's mine if anyone is interested. I wrote it in D and haven't looked at it in years. I'm sure it's not usable as-is, but it might be fun anyway.<p><a href="https://github.com/nogwater/NetflixPrizeD" rel="nofollow">https://github.com/nogwater/NetflixPrizeD</a><p>The algorithm is based on Simon Funk's blog post here: <a href="http://sifter.org/~simon/journal/20061211.html" rel="nofollow">http://sifter.org/~simon/journal/20061211.html</a><p>For me, the best part was squeezing the data and indexes into memory. :)
alexbw将近 13 年前
@tuananh I've got the dataset stored away, but I don't know if I'm legally allowed to post it. Would love if someone could produce proof one way or the other.<p>@viraj_shah I spent about 6 months working on the project before I had to stop to concentrate on my schoolwork (I was a senior in collge at the time). I think it would have been impossible to do this for myself without Cython. If it were to happen today, I would probably be writing in PyCuda, or with Numba, and it would be much, much, MUCH more succinct.
评论 #4430832 未加载
arekp将近 13 年前
I wrote a 195-page monograph on the Netflix Prize, for people interested in that sort of stuff: <a href="http://arek-paterek.com/book" rel="nofollow">http://arek-paterek.com/book</a>
评论 #4432350 未加载
richardlblair超过 12 年前
You indent by 8 characters.... I wanted to read your code but this will make my eyes bleed.<p>From pep 8: "Use 4 spaces per indentation level."
评论 #4433173 未加载
skystorm将近 13 年前
Very nice. It might be helpful to (briefly) describe the actual techniques you tried in the readme file? At least that's the first thing I looked for...
viraj_shah将近 13 年前
This was incredibly kind of you to post up. It is great to see it public domain as many can learn from it. The Cython code looks scary though - 18k lines! May I ask how long you spent on this?
andreasvc超过 12 年前
Nitpick: binary blobs like .pyc and .so don't belong in a code repository. Instead you would put a makefile or setup.py to compile the .pyx files.
jdleesmiller将近 13 年前
I also worked on this at uni and had lots of fun -- those lessons certainly look familiar! We were trying to mine Wikipedia for more information on the movies. The code's here:<p><a href="http://code.google.com/p/wikipedia-netflix/wiki/WikipediaNetflix" rel="nofollow">http://code.google.com/p/wikipedia-netflix/wiki/WikipediaNet...</a><p>It includes Wikipedia parsing stuff and a fairly fast C++ implementation of the very cool BellKor kNN algorithm.
评论 #4431429 未加载
评论 #4431845 未加载
JacobiX超过 12 年前
Thanks for posting this. Your code combines two successful approachs : a latent factor model (SVD) and a neighborhood model :)<p>Here's my implementation of a recommender algorithm in C if someone is interested : <a href="https://github.com/GHamrouni/Recommender" rel="nofollow">https://github.com/GHamrouni/Recommender</a>
surine超过 12 年前
Mine is at <a href="https://github.com/hbcdev/Netflix" rel="nofollow">https://github.com/hbcdev/Netflix</a> in C++, got to the top 500. It runs just inside my 8GB PC :)
tlocke将近 13 年前
I had a look though this and was confused by this line:<p>ratings[ratings&#60;1.0] = 1.0<p>in<p><a href="https://github.com/alexbw/Netflix-Prize/blob/master/src/predict.py" rel="nofollow">https://github.com/alexbw/Netflix-Prize/blob/master/src/pred...</a><p>Is it specific to NumPy? Or perhaps a Python trick I haven't seen before?
评论 #4431621 未加载
评论 #4431501 未加载
tuananh将近 13 年前
great for ref. however i can't found the dataset anywhere on the internet :(
评论 #4430722 未加载
raheemm将近 13 年前
Good readme file. Liked the lessons learned.
marklit将近 13 年前
Does anyone have any thoughts on the lack of conformity to PEP8? His code works and it's valid python but I feel that it's difficult to read.