That tree's a good example of how not to do it.<p>For one thing, it's representing something that's not really a tree. "Support Vector Machine" and "Neural Networks" appear more than once as leaf nodes. Like any Chinese Encyclopedia classifications they often succumb to the temptation to add nodes that say "Other" to keep the length of the branches constant. (They could probably think of some name for what neural nets and the SVM have in common -- they've got all day to think about this stuff because they get paid to teach and to do research, it's not like they are harried practitioners.)<p>At some point I quit distinguishing regression and classification. There was a time when I knew some tricks for classification and regression seemed mysterious. Once I got over my mental block it seemed pretty obvious that much of my box of tricks worked for regression too.<p>Another issue is that it's not a good graphic for the web. You could probably print this out and read it but you can't take it in at glance on the web which destroys the purpose of it being an infographic
I think it's a nice overview. I don't mind that certain things appear more than once as "leaf nodes", because it shows that the same methods can be used for different things. Visualizing this with only one "leaf node" each would have been more messy in my opinion.
I also think the differentiation of Classification and Regression is justified, because while Regression can be used for Classification, it's not quite the same thing.<p>I'm not an expert, but in my opinion the difference is that Classification is sorting a basket of apples and bananas into two separate baskets, while Regression is predicting which fruit will come out of the basket after X apples and Y bananas.
As a quick rule of thumb would be to never trust anyone who claims "regression" is a separate topic from "classification". Oh yes, and in this case it is beyond awful.
This is really, really cool, thanks Dr. Sayad. I love how this allows people to see the entire process and then drill into each step as deeply as they want to go.<p>I really think this is the way complex topics need to be taught. It's so easy to get caught in the weeds and lose track of where you are in the overall picture, an approach like this is extremely helpful.
That's a great overview of data mining. I am wondering if you can somehow insert multivariate analysis into this flow chart.
Multivariate analysis may be a good substitute for clustering methods.
Thank you for providing such a clear picture.