TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What's a good machine learning independent study project?

11 点作者 shakeel_mohamed超过 11 年前
My school doesn&#x27;t offer very many CS electives, and the ones they do offer aren&#x27;t interesting to me.<p>So, I m going to propose an independent study around machine learning - which I know nothing about right now. The highest level math classes I&#x27;ve taken are multivariable calculus and linear algebra. Other CS courses I&#x27;ve taken are data structures, OO design, OS &amp; networks, web design, database fundamentals. Coming up I&#x27;m taking: languages &amp; computation, and algorithms analysis.<p>I was thinking about making an AI that learns to play chess over time, but I don&#x27;t know if that&#x27;s too much work for an 11 week quarter.<p>What other projects would fit within the scope of 11 weeks?

5 条评论

patio11超过 11 年前
YMMV on this, but I studied CS with an informal concentration on AI&#x2F;natural languages. Here&#x27;s some take-them-or-leave-them suggestions.<p>If you want to maximize the return on your time for this class, do a project which:<p>1) Uses one or many data sources which are publicly available but which, ideally, are not quite as simple to access as straight downloading a CSV file. A bit of practical experience with scraping, API use, or data processing doesn&#x27;t hurt. Bonus points if you get a taste for working with large data sets.<p>2) You will not make an AI which learns to play chess in 11 weeks, or in 11 years. Just to set expectations. A more reasonable task for the same timeframe given your current skillset is e.g. &quot;Given a large corpus of documents and a small number of them are hand-tagged, explore a few different approaches for classifying the remainder of the documents.&quot; A motivated undergrad can succeed at implementing a Bayesian classifier, but you will not advance the state of the art on chess.<p>3) A lot of academic projects focus on toy problems, like e.g. chess or a contrived simplification of a real system. There is no reason that you have to adopt this academic convention: consider picking a real system with consequences. There exist many websites which have information on them that actually impact decisions which people care about -- wouldn&#x27;t you rather learn to do analysis on that rather than pulling out arbitrary trivia out of e.g. the British national corpus (which, I rush to mention, is an excellent tool).<p>4) Think about the presentation layer for findings in more detail that the typical academic paper, which spits out a sentence or two of summary stats and maybe graphs them. This might be an opportunity to have a bit of fun doing, e.g., a website which lets you search through your (voluminous) findings.<p>Putting it all together, you could imagine something like &quot;I have developed a website and&#x2F;or Chrome plugin which, when pointed at an Etsy item, predicts the likelihood that it will sell. Or it predicts the likelihood that a KickStarter campaign will succeed. Or it predicts the final sale value of an eBay auction -- better in some categories than others, see page 6. Or it successfully paints a red&#x2F;blue map of the United States using no prior knowledge other than a geolocation database and the Twitter stream. Or it asks you ten questions about seemingly irrelevant trivia and then makes a surprisingly accurate prediction on how long it has been since you ate sushi.&quot;
评论 #7231816 未加载
angersock超过 11 年前
Simple idea:<p>Given a post text or image, give the three boards it was most likely posted to on 4chan.<p>Data is easily available on the 4chan API, and you can do things from very simple (matching word frequencies) to complex (NLP and image recognition).<p>EDIT:<p>Bonus round--train it to generate posts for a given board.
rfergie超过 11 年前
I&#x27;m doing some work for a small UK based charity.<p>I have several clustering&#x2F;prediction problems in my pipeline at the moment.<p>Drop me a line (email in profile) if you are interested in having a crack at one of them. Should give you insight into all sorts of stuff apart from big data
Irishsteve超过 11 年前
Students in my place usually end up going through all the content in <a href="http://www.cs.waikato.ac.nz/ml/weka/book.html" rel="nofollow">http:&#x2F;&#x2F;www.cs.waikato.ac.nz&#x2F;ml&#x2F;weka&#x2F;book.html</a><p>In terms fo projects etc. there are about 4 or 5 assignments that range from spam detection, to parameter setting optimisation.
sharemywin大约 11 年前
Check out Restricted Boltzmann Machines and Deep learning.