科技回声

5 条评论

patio11超过 11 年前

YMMV on this, but I studied CS with an informal concentration on AI/natural languages. Here's some take-them-or-leave-them suggestions.If you want to maximize the return on your time for this class, do a project which:1) Uses one or many data sources which are publicly available but which, ideally, are not quite as simple to access as straight downloading a CSV file. A bit of practical experience with scraping, API use, or data processing doesn't hurt. Bonus points if you get a taste for working with large data sets.2) You will not make an AI which learns to play chess in 11 weeks, or in 11 years. Just to set expectations. A more reasonable task for the same timeframe given your current skillset is e.g. "Given a large corpus of documents and a small number of them are hand-tagged, explore a few different approaches for classifying the remainder of the documents." A motivated undergrad can succeed at implementing a Bayesian classifier, but you will not advance the state of the art on chess.3) A lot of academic projects focus on toy problems, like e.g. chess or a contrived simplification of a real system. There is no reason that you have to adopt this academic convention: consider picking a real system with consequences. There exist many websites which have information on them that actually impact decisions which people care about -- wouldn't you rather learn to do analysis on that rather than pulling out arbitrary trivia out of e.g. the British national corpus (which, I rush to mention, is an excellent tool).4) Think about the presentation layer for findings in more detail that the typical academic paper, which spits out a sentence or two of summary stats and maybe graphs them. This might be an opportunity to have a bit of fun doing, e.g., a website which lets you search through your (voluminous) findings.Putting it all together, you could imagine something like "I have developed a website and/or Chrome plugin which, when pointed at an Etsy item, predicts the likelihood that it will sell. Or it predicts the likelihood that a KickStarter campaign will succeed. Or it predicts the final sale value of an eBay auction -- better in some categories than others, see page 6. Or it successfully paints a red/blue map of the United States using no prior knowledge other than a geolocation database and the Twitter stream. Or it asks you ten questions about seemingly irrelevant trivia and then makes a surprisingly accurate prediction on how long it has been since you ate sushi."

评论 #7231816 未加载

angersock超过 11 年前

Simple idea:Given a post text or image, give the three boards it was most likely posted to on 4chan.Data is easily available on the 4chan API, and you can do things from very simple (matching word frequencies) to complex (NLP and image recognition).EDIT:Bonus round--train it to generate posts for a given board.

rfergie超过 11 年前

I'm doing some work for a small UK based charity.I have several clustering/prediction problems in my pipeline at the moment.Drop me a line (email in profile) if you are interested in having a crack at one of them. Should give you insight into all sorts of stuff apart from big data

Irishsteve超过 11 年前

Students in my place usually end up going through all the content in <a href="http://www.cs.waikato.ac.nz/ml/weka/book.html" rel="nofollow">http://www.cs.waikato.ac.nz/ml/weka/book.html</a>In terms fo projects etc. there are about 4 or 5 assignments that range from spam detection, to parameter setting optimisation.

sharemywin大约 11 年前

Check out Restricted Boltzmann Machines and Deep learning.

5 条评论

patio11超过 11 年前

评论 #7231816 未加载

angersock超过 11 年前

rfergie超过 11 年前

Irishsteve超过 11 年前

sharemywin大约 11 年前

Check out Restricted Boltzmann Machines and Deep learning.

Ask HN: What's a good machine learning independent study project?

5 条评论

Ask HN: What's a good machine learning independent study project?

5 条评论