TechEcho

4 comments

apuover 15 years ago

My Observations:* Instance-based methods can adapt to any kind of pattern if you have enough data. This is a well-known result in machine learning -- the interesting question is simply how much data do you need?* It's quite remarkable how well RBF SVMs do in almost all cases. They even outperformed a 2nd degree polynomial kernel SVM on a 2nd degree polynomial!* Logistic regression is pretty terrible for anything except "easy" data -- simple linear data.* Random forests are sometimes ok, but tend to prefer axis-aligned data (at least in his formulation).* Meta-point: all the tests shown are with "clean" data...i.e., there are no "wrong" training examples. This is unrealistic, and in practice makes a HUGE difference for some of these methods. E.g., a lot of the rule-based methods get demolished by even a little bit of wrong data. In contrast, SVMs have a slack variable that can tolerate some amount of noise, and would probably shine even more on such data.

评论 #804400 未加载

评论 #804295 未加载

mahmudover 15 years ago

OT:What do the cognoscenti recommend for doing automatic text classification/categorization? I have been looking at Spam filters, and they're mostly boolean type predicates that return Spam/NotSpam results along with a confidence number. I want to be able to do that same for a large number of categories.

评论 #804408 未加载

alextpover 15 years ago

It bothers me that these data sets are very low dimensional, without noise and pretty-picture-like. This pretty much excludes any interesting data to try to learn (after all one could easily manually code a classifier for most os these "concepts" that performs 100%)

评论 #804614 未加载

gregover 15 years ago

So what generalizations can we make from these plots? Instance and rule based methods learn better? Random forest seems to be especially reliable.

评论 #804264 未加载

4 comments

apuover 15 years ago

评论 #804400 未加载

评论 #804295 未加载

mahmudover 15 years ago

评论 #804408 未加载

alextpover 15 years ago

评论 #804614 未加载

gregover 15 years ago

So what generalizations can we make from these plots? Instance and rule based methods learn better? Random forest seems to be especially reliable.

评论 #804264 未加载

Machine Learning Classifier Gallery (Tom Fawcett)

4 comments

Machine Learning Classifier Gallery (Tom Fawcett)

4 comments