In case you see the cheat sheet and think, "Wow, I'd love to understand that," there's an excellent (albeit challenging) complete course on machine learning in Stanford's "engineering everywhere" online repository. <a href="http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1" rel="nofollow">http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a...</a>
All the algorithms requiring training can be optimized using stochastic gradient descent-- which is very effective for large data sets (see <a href="http://leon.bottou.org/research/stochastic" rel="nofollow">http://leon.bottou.org/research/stochastic</a>)<p>Also, here are some additions for the online learning column:<p>* Online SVM: <a href="http://www.springerlink.com/index/Y8666K76P6R5L467.pdf" rel="nofollow">http://www.springerlink.com/index/Y8666K76P6R5L467.pdf</a><p>* Online gaussian mixture estimation:
<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87.1698&rep=rep1&type=pdf" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87....</a><p>One more thing: why no random forests? Or decision tree ensembles of any sort?
KNN <i>"no learning involved"</i>: one probaby wants to cross-validate K at the least, if not learn the metric.<p>Some methods say online learning isn't applicable. As pointed out elsewhere, objectives for K-means and mixture models could be fitted with stochastic gradient descent. In general there is always an online option. For example, keep a restricted set of items and chuck out ones that seem less useful as others come in.<p>(Aside: I have a <i>very</i> introductory lecture to machine learning on the web: <a href="http://videolectures.net/bootcamp2010_murray_iml/" rel="nofollow">http://videolectures.net/bootcamp2010_murray_iml/</a> — not for anyone that knows the methods on this cheat sheat!)
Nice summary; I like the format as well. However, the title of the cheat sheet is misleading since (a) many of the algorithms listed can be used for non-linear classification and (b) some of them can be considered supervised learning, such as naive Bayes and perceptron since they're trained with sample inputs and expected outputs (supervisory signals).<p>Otherwise, this is awesome. Hopefully you will add to it, and make it available in web form.
Fantastic work, I have an ML exam coming up and this should really help. If I'm honest its one of the subjects I've struggled with the most. It seems experts in the field while incredibly intelligent, have a hard time breaking the material down into structured and easily digestible pieces of information.