TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An Introduction to Data Mining

224 点作者 ssn将近 14 年前

6 条评论

PaulHoule将近 14 年前
That tree's a good example of how not to do it.<p>For one thing, it's representing something that's not really a tree. "Support Vector Machine" and "Neural Networks" appear more than once as leaf nodes. Like any Chinese Encyclopedia classifications they often succumb to the temptation to add nodes that say "Other" to keep the length of the branches constant. (They could probably think of some name for what neural nets and the SVM have in common -- they've got all day to think about this stuff because they get paid to teach and to do research, it's not like they are harried practitioners.)<p>At some point I quit distinguishing regression and classification. There was a time when I knew some tricks for classification and regression seemed mysterious. Once I got over my mental block it seemed pretty obvious that much of my box of tricks worked for regression too.<p>Another issue is that it's not a good graphic for the web. You could probably print this out and read it but you can't take it in at glance on the web which destroys the purpose of it being an infographic
评论 #2573291 未加载
评论 #2573211 未加载
评论 #2573726 未加载
评论 #2576035 未加载
评论 #2573648 未加载
评论 #2574940 未加载
asrk将近 14 年前
I think it's a nice overview. I don't mind that certain things appear more than once as "leaf nodes", because it shows that the same methods can be used for different things. Visualizing this with only one "leaf node" each would have been more messy in my opinion. I also think the differentiation of Classification and Regression is justified, because while Regression can be used for Classification, it's not quite the same thing.<p>I'm not an expert, but in my opinion the difference is that Classification is sorting a basket of apples and bananas into two separate baskets, while Regression is predicting which fruit will come out of the basket after X apples and Y bananas.
dvse将近 14 年前
As a quick rule of thumb would be to never trust anyone who claims "regression" is a separate topic from "classification". Oh yes, and in this case it is beyond awful.
评论 #2573582 未加载
评论 #2573375 未加载
评论 #2574800 未加载
jasonkolb将近 14 年前
This is really, really cool, thanks Dr. Sayad. I love how this allows people to see the entire process and then drill into each step as deeply as they want to go.<p>I really think this is the way complex topics need to be taught. It's so easy to get caught in the weeds and lose track of where you are in the overall picture, an approach like this is extremely helpful.
lightoverhead将近 14 年前
That's a great overview of data mining. I am wondering if you can somehow insert multivariate analysis into this flow chart. Multivariate analysis may be a good substitute for clustering methods. Thank you for providing such a clear picture.
评论 #2574474 未加载
saedsayad将近 14 年前
I enjoyed reading all your comments. I would be glad to moderate the "wiki" version.