TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Machine Learning Classifier Gallery (Tom Fawcett)

37 pointsby chromophoreover 15 years ago

4 comments

apuover 15 years ago
My Observations:<p>* Instance-based methods can adapt to any kind of pattern if you have enough data. This is a well-known result in machine learning -- the interesting question is simply how much data do you need?<p>* It's quite remarkable how well RBF SVMs do in almost all cases. They even outperformed a 2nd degree polynomial kernel SVM on a 2nd degree polynomial!<p>* Logistic regression is pretty terrible for anything except "easy" data -- simple linear data.<p>* Random forests are sometimes ok, but tend to prefer axis-aligned data (at least in his formulation).<p>* Meta-point: all the tests shown are with "clean" data...i.e., there are no "wrong" training examples. This is unrealistic, and in practice makes a HUGE difference for some of these methods. E.g., a lot of the rule-based methods get demolished by even a little bit of wrong data. In contrast, SVMs have a slack variable that can tolerate some amount of noise, and would probably shine even more on such data.
评论 #804400 未加载
评论 #804295 未加载
mahmudover 15 years ago
OT:<p>What do the cognoscenti recommend for doing automatic text classification/categorization? I have been looking at Spam filters, and they're mostly boolean type predicates that return Spam/NotSpam results along with a confidence number. I want to be able to do that same for a large number of categories.
评论 #804408 未加载
alextpover 15 years ago
It bothers me that these data sets are very low dimensional, without noise and pretty-picture-like. This pretty much excludes any interesting data to try to learn (after all one could easily manually code a classifier for most os these "concepts" that performs 100%)
评论 #804614 未加载
gregover 15 years ago
So what generalizations can we make from these plots? Instance and rule based methods learn better? Random forest seems to be especially reliable.
评论 #804264 未加载