TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An Introduction to Data Mining

224 pointsby ssnalmost 14 years ago

6 comments

PaulHoulealmost 14 years ago
That tree's a good example of how not to do it.<p>For one thing, it's representing something that's not really a tree. "Support Vector Machine" and "Neural Networks" appear more than once as leaf nodes. Like any Chinese Encyclopedia classifications they often succumb to the temptation to add nodes that say "Other" to keep the length of the branches constant. (They could probably think of some name for what neural nets and the SVM have in common -- they've got all day to think about this stuff because they get paid to teach and to do research, it's not like they are harried practitioners.)<p>At some point I quit distinguishing regression and classification. There was a time when I knew some tricks for classification and regression seemed mysterious. Once I got over my mental block it seemed pretty obvious that much of my box of tricks worked for regression too.<p>Another issue is that it's not a good graphic for the web. You could probably print this out and read it but you can't take it in at glance on the web which destroys the purpose of it being an infographic
评论 #2573291 未加载
评论 #2573211 未加载
评论 #2573726 未加载
评论 #2576035 未加载
评论 #2573648 未加载
评论 #2574940 未加载
asrkalmost 14 years ago
I think it's a nice overview. I don't mind that certain things appear more than once as "leaf nodes", because it shows that the same methods can be used for different things. Visualizing this with only one "leaf node" each would have been more messy in my opinion. I also think the differentiation of Classification and Regression is justified, because while Regression can be used for Classification, it's not quite the same thing.<p>I'm not an expert, but in my opinion the difference is that Classification is sorting a basket of apples and bananas into two separate baskets, while Regression is predicting which fruit will come out of the basket after X apples and Y bananas.
dvsealmost 14 years ago
As a quick rule of thumb would be to never trust anyone who claims "regression" is a separate topic from "classification". Oh yes, and in this case it is beyond awful.
评论 #2573582 未加载
评论 #2573375 未加载
评论 #2574800 未加载
jasonkolbalmost 14 years ago
This is really, really cool, thanks Dr. Sayad. I love how this allows people to see the entire process and then drill into each step as deeply as they want to go.<p>I really think this is the way complex topics need to be taught. It's so easy to get caught in the weeds and lose track of where you are in the overall picture, an approach like this is extremely helpful.
lightoverheadalmost 14 years ago
That's a great overview of data mining. I am wondering if you can somehow insert multivariate analysis into this flow chart. Multivariate analysis may be a good substitute for clustering methods. Thank you for providing such a clear picture.
评论 #2574474 未加载
saedsayadalmost 14 years ago
I enjoyed reading all your comments. I would be glad to moderate the "wiki" version.