TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Linear compression in Python: PCA vs unsupervised feature selection

77 pointsby efavdbalmost 7 years ago

7 comments

wjn0almost 7 years ago
I found this line confusing:<p>&gt; The printed lines above show that both algorithms capture more than 50% of the variance exhibited in the data using only 4 of the 50 stocks.<p>Based on the sklearn PCA documentation [1] this has nothing to do with the coefficients on individual stocks, and for PCA should read more like: &quot;[...] capture more than 50% of the variance exhibited in the data using only 4 components [...]&quot; which is not the same thing.<p>1. <a href="http:&#x2F;&#x2F;scikit-learn.org&#x2F;stable&#x2F;modules&#x2F;generated&#x2F;sklearn.decomposition.PCA.html" rel="nofollow">http:&#x2F;&#x2F;scikit-learn.org&#x2F;stable&#x2F;modules&#x2F;generated&#x2F;sklearn.dec...</a>
评论 #17750182 未加载
评论 #17750202 未加载
samfisher83almost 7 years ago
Does it even makes sense to run PCA on the change percentage of a stock. To me it would be make more sense to use it with physical properties of the under lying the company. PCA helps you reduce dimensions of a higher order dimension to lower dimension so you can group stocks together. I am a little confused by what the author is trying to do.
评论 #17750036 未加载
评论 #17751040 未加载
thanatropismalmost 7 years ago
I wish people were better acquainted with the literature, e.g. <a href="https:&#x2F;&#x2F;www.nowpublishers.com&#x2F;article&#x2F;Details&#x2F;ECO-002" rel="nofollow">https:&#x2F;&#x2F;www.nowpublishers.com&#x2F;article&#x2F;Details&#x2F;ECO-002</a><p>(Ed: yeah, that&#x27;s just a sample of the book but has a large bibliography at the end.)
rubatugaalmost 7 years ago
I can&#x27;t seem to make the COD reach 1.0<p><pre><code> &gt;&gt;&gt; selector.ordered_cods [0.43298218, ... , 0.5068577, 0.5068577] </code></pre> Would you think this a problem&#x2F;bug?
评论 #17753811 未加载
squigs25almost 7 years ago
Another technique for unsupervised feature selection is Principal Feature Analysis (PFA): <a href="http:&#x2F;&#x2F;venom.cs.utsa.edu&#x2F;dmz&#x2F;techrep&#x2F;2007&#x2F;CS-TR-2007-011.pdf" rel="nofollow">http:&#x2F;&#x2F;venom.cs.utsa.edu&#x2F;dmz&#x2F;techrep&#x2F;2007&#x2F;CS-TR-2007-011.pdf</a>
octopodalmost 7 years ago
This dataset could be interesting as it consists of stocks and cryptos <a href="https:&#x2F;&#x2F;vectorspace.ai&#x2F;recommend&#x2F;datasets" rel="nofollow">https:&#x2F;&#x2F;vectorspace.ai&#x2F;recommend&#x2F;datasets</a>
closedalmost 7 years ago
This title seems a bit confusing, since PCA is a form of unsupervised feature selection (or rather, feature weighting).<p>The title seems like it has the form &quot;&lt;Specific method&gt; vs &lt;Broader category method fits in&gt;&quot;.
评论 #17751625 未加载