PCA is a pretty neat technique. It's quite old too, invented by Pearson in the early 1900's.<p>Basically, you find a "vector" that travels along the part of the data with the highest variance. Then you find an orthogonal vector that travels along the part with the next highest variance.<p>You then have a set of vectors that explain all of the variance, that aren't correlated (because they're orthogonal), and are ranked by how much they explain.<p>This can be useful in regression to get rid of correlated variables, or you can get rid of some of the low variance components if there are more columns than rows, which breaks OLS regression.<p>Consider a new town that you want to get to know as quickly as possible. What is the best method? You start with the longest street, then take a left and travel the next longest street, and so on. You can get a pretty good idea about the town without seeing it all.