TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Choosing a First Machine Learning Project: Start by Reading or by Doing?

30 点作者 danger将近 15 年前

4 条评论

jacoblyles将近 15 年前
I'm not sure what you are imagining the scope of your first project to be, but I would recommend you begin by understanding and implementing some well-known algorithms. Start with a Guassian Mixture model trained with the EM algorithm. Then do linear and logistic regression, perceptron is pretty simple and a simpler version of the widely used support vector machine.<p>The handwriting recognition database here is fantastic for testing a variety of simple ML models:<p><a href="http://yann.lecun.com/exdb/mnist/" rel="nofollow">http://yann.lecun.com/exdb/mnist/</a><p>In our machine learning class, we would use data from the KDD cup for our projects. Why don't you create a submission for old KDD cups and see if your model can do better than random? 1998 is good for logistic and linear regression:<p><a href="http://www.kdnuggets.com/meetings/kdd98/kdd-cup-98.html" rel="nofollow">http://www.kdnuggets.com/meetings/kdd98/kdd-cup-98.html</a><p>Using the 2007 dataset, you can try out some of the matrix factoring methods that have worked well for the Netflix prize:<p><a href="http://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html" rel="nofollow">http://www.cs.uic.edu/~liub/Netflix-KDD-Cup-2007.html</a><p>These should all be relatively small tasks. Learning how to interpret your results and iterate your models to make them better will take longer than understanding the algorithms.
评论 #1485824 未加载
endtime将近 15 年前
Start by doing...pick a relatively easy algorithm, implement it, and fully understand why it's doing what it's doing. If you start out by implementing something with extreme math-fu it may just seem like magic.<p>My first ML project was to implement STAGGER (Schlimmerand Granger, 1986), which is a very simple algorithm for handling concept drift. Then I trained it on the domain {red, green, blue} X {square, circle, triangle} X {small, medium, large}. I fed it 40 positive examples of small red square, then 40 positive examples of large green triangle, then 40 positive examples of medium blue circle, and watched the learned concept change. I understood how and why it worked, and that felt pretty good.
apurva将近 15 年前
For what it's worth, I think it's important to start with interesting problems. My first interaction with ML was an implementation of Naive Baye's for classifying spam, borrowing much ideas from PG's A Plan for Spam from scratch (ie no libraries). This is what got me really interested in the field, much more than randomly picking up topics- there are just so many areas to choose from. Another approach would be to read up on standard supervised learning techniques and just observing how the parameters for these algos behave on datasets. something like a Weka really comes in handy if you wish to focus on analyzing behavior of such techniques first. Best of luck!
raintrees将近 15 年前
Doing. If I just read, I don't get the other parts, like getting the debugging aspect correct, finding out the requirements of the environment, etc. Plus, I have less investment if I haven't typed the code in myself.<p>Edit: And continue with the formalized learning through reading (and experimentation). Later get a handle on common conventions, as they usually help accuracy/readability.