TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Principal Component Analysis for Dummies

147 点作者 jackkinsella超过 11 年前

13 条评论

nroman超过 11 年前
When I was studying this in college I always found the &quot;Eigenfaces&quot; example very enlightening (<a href="http://en.wikipedia.org/wiki/Eigenface" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Eigenface</a>).<p>In case you&#x27;re not familiar with them, the basic idea is treating a image of a face as a very high dimensional vector, and then doing what amounts to PCA on a collection of them. I&#x27;m leaving off a few steps, but the resulting eigenvectors converted back into images helped me grasp what was going on in a much more intuitive fashion.
评论 #6643696 未加载
评论 #6645032 未加载
thearn4超过 11 年前
Another angle: the PCA is given by computing the SVD (a more general analog of eigenvalue&#x2F;eigenvector decomposition) of a whitened representation of the data. Some idiosyncrasies of PCA them become obvious: we can&#x27;t determine if a computed result is actually a sought result or its reflection&#x2F;negative, because the SVD is only unique up to sign variance.<p>This is also closer to it&#x27;s actual implementation: while it&#x27;s true that you do technically need the eigenbasis of the covariance matrix, you should not actually form the covariance matrix to get there...
评论 #6644981 未加载
评论 #6643946 未加载
avn2109超过 11 年前
It seems to me that explanations of technical topics in natural, everyday language are very valuable. People who would be turned off by a formalized explanation and dense symbolic manipulation can still get a lot out of this. Bravo.
adamnemecek超过 11 年前
Does anyone else have an extremely hard time trying to read it due to being blinded by the bright yellow?
评论 #6646444 未加载
评论 #6644080 未加载
neltnerb超过 11 年前
I like the goal, thanks for the presentation. Personally, I&#x27;d have preferred an example on high-dimensionality data like curve fitting where this is actually most important, but that&#x27;s because I&#x27;m a nerd and I like graphs perhaps.<p>FTIR data analysis is a fantastic example for PCA analysis -- each principle factor ends up (probably) being the spectrum of one of the major real physical components. But this is maybe too abstract?<p>A less abstract one might be a distribution of test scores. Your actual dataset is &quot;number&quot; versus &quot;score&quot;, and you could show two gaussians, one at a low number and one at a high number. Then you could show that across three exams, you always see the same scores, but with different intensities. That would let you compute that the principle components are those two gaussians. Then you can hypothesize that each group is a collection of students that study together, and so they get similar scores. Or something like that.<p>Anyway, no intent to be a wet blanket. It&#x27;s a nice writeup, and it is nice of you to share.
jmdeldin超过 11 年前
I found Lindsay I. Smith&#x27;s &quot;A tutorial on Principal Components Analysis&quot; [1] really useful because it covers the mathematics behind PCA but gives enough linear algebra background for it to be understandable by those with distant or weak math backgrounds (e.g., me).<p>[1] <a href="http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf" rel="nofollow">http:&#x2F;&#x2F;www.cs.otago.ac.nz&#x2F;cosc453&#x2F;student_tutorials&#x2F;principa...</a>
评论 #6651903 未加载
评论 #6645671 未加载
micro_cam超过 11 年前
PCA&#x27;s are cool but I find it maddening when people convert sparse data (like counts of how many words are shared between documents) into dense distance data to use it.<p>You can shortcut the whole process by finding the smallest non zero eigenvalue&#x2F;eigenvector pairs of the graph laplacian (Fiedler vectors). You need to use a sparse solver that can find the smallest values&#x2F;vectors instead of the larges (like LOBCPG) but that is faster anyways.
ksk超过 11 年前
Wow, this brings back memories. I remember using PCA for feature extraction from image data to be used in SVM based image classification. Though as I recall, PCA added a huge tax on the processing time and provided, in comparison, a small boost in accuracy. (IIRC We split the data 4:1 into training &amp; classification)
hcarvalhoalves超过 11 年前
Thank you for this primer, this is related to something I&#x27;m studying right now and is much easier to understand.
dnautics超过 11 年前
doesn&#x27;t the spread-out-ness then depend on the units? If your data have unit X on one axis, and unit Y on another, then how can you say that the &quot;maximal-spread-outness&quot; is in any given direction, when you can merely adjust the scale on one axis and alter how numerically spread out it looks?
评论 #6645206 未加载
评论 #6644991 未加载
therobot24超过 11 年前
yes there&#x27;s hundreds of these online - though i must admit that the images are well done to convey the main point of dimensionality reduction, those who don&#x27;t necessarily understand eigenvectors or covariance matrices will be able to see what&#x27;s happening between the lines
bsaul超过 11 年前
that&#x27;s brilliant. i&#x27;ve studied the maths behind all this years ago, and only now do i find an intuitive explanation of all this. many thanks.
AsymetricCom超过 11 年前
These intuitive guides are great for those of us who&#x27;ve built an intuitive understanding of higher math through practical application on computers instead of formal, academic means.