TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Principle Component Analysis

42 pointsby karamazovalmost 13 years ago

7 comments

lbarrowalmost 13 years ago
The author is basically using a linear algebra tool for creating orthogonal basis vectors of a matrix of stock prices. (The PCA is like eigenvector decomposition, but it works on rectangular matrices too. In fact, unlike many operations, it's very fast on unbalanced rectangular matrices!) Since these vectors are, by definition, uncorrelated, they can be very useful in building CAPM-balanced stock portfolios.<p>Using the PCA is great in this situation, but people often run into traps when using these sorts of spectral-decomposition methods on real world data.<p>The most obvious is that they try to interpret what the vectors "represent". Sometimes this is reasonable -- if you did a similar experiment on the stock price of energy companies, the strongest vector probably really would be closely correlated with the price of oil. But aside from unusual situations like that, interpreting the "meaning" of spectral vectors is a fool's errand.
评论 #4398745 未加载
评论 #4398774 未加载
btillyalmost 13 years ago
PCA is a very useful tool in lots of places. But be warned that when you use it on stocks, you'll find correlations, make your investment, then discover that during a financial crisis all sorts of things that were not previously correlated, now are. Thus your analysis falls apart at exactly the moment you would least want it to do so.<p>Incidentally if you take answers to a wide variety of questions that are meant to test intelligence, how the component of your score on the first component on a PCA analysis should be fairly well correlated with IQ or your SAT score. The second component should be reasonably well correlated to the difference between your math and verbal scores on the SAT. And people have much less variability on the third component than on the first two.
评论 #4399043 未加载
robert00700almost 13 years ago
Nice to see PCA in an HN article, it's a very powerful tool.<p>For those struggling to get the example in this article, I find PCA easier to understand given visual examples, and in less dimensions (try <a href="http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png" rel="nofollow">http://en.wikipedia.org/wiki/File:GaussianScatterPCA.png</a>)<p>Note how this dataset is two dimensional in nature, and PCA yields two vectors. The first gives the direction of the greatest variation, and the next gives the variation orthogonally to the first.<p>An awesome use of PCA is for facial detection, a method called 'Eigenfaces' <a href="http://en.wikipedia.org/wiki/Eigenface" rel="nofollow">http://en.wikipedia.org/wiki/Eigenface</a>
评论 #4398887 未加载
评论 #4398783 未加载
评论 #4398677 未加载
telalmost 13 years ago
PCA goes far deeper than meets the eye. For instance, it's a well-known phenomenon that too much dimensionality can actually drive predictor performance to random, but PCA can mitigate that. It's a basically the bread and butter of practical unsupervised learning.
评论 #4398782 未加载
评论 #4398901 未加载
misiti3780almost 13 years ago
PCA can also be used for compression<p><a href="http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/pca.html" rel="nofollow">http://www.willamette.edu/~gorr/classes/cs449/Unsupervised/p...</a><p>Also worth noting is apache mahout supports PCA - you can perform this type of analysis on large matrices pretty easily these days
mturmonalmost 13 years ago
This expository post lined up the 6 stocks and computed the SVD of the time history of all 6 together. This shows how the 6 stocks correlate.<p>You can do it another way. Run a sliding window across one single stock, line up all the resulting vectors, and then take the SVD of (err...apply PCA to) that. That is, if you started with a single-stock time history:<p><pre><code> x1, x2, x3... </code></pre> then form:<p><pre><code> z1 = [x1 x2 x3] z2 = [x2 x3 x4] z3 = [x3 x4 x5] </code></pre> etc., and use PCA on the z's instead of the x's. (In practice, you'd make the z's much longer.)<p>This will extract seasonable variability (on all kinds of scales -- not just annual). One name for it is Singular Spectrum Analysis (<a href="http://en.wikipedia.org/wiki/Singular_spectrum_analysis" rel="nofollow">http://en.wikipedia.org/wiki/Singular_spectrum_analysis</a>)
eykanalalmost 13 years ago
For what it's worth, the best PCA tutorial I've seen online is this blog post, which uses plots to describe the technique:<p><a href="http://stats.stackexchange.com/a/2700/2019" rel="nofollow">http://stats.stackexchange.com/a/2700/2019</a><p>PCA is nothing more than a "basis shift", or changing where the x and y axes are placed. This image-based tutorial makes understanding very intuitive.