TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why do we Sometimes get Nonsense-Correlations between Time-Series?

15 pointsby ahalanover 12 years ago

3 comments

EvanMillerover 12 years ago
The money quote of this article is:<p>"I propose to term such correlations... the <i>serial correlations</i> for the given series."<p>The basic idea here is that observations in a time-series are not actually independent because each observation is highly correlated with the previous observation, and so the usual significance tests and standard errors do not apply.<p>Some links if you're interested in learning more about analyzing serial correlation and time-series data:<p><a href="http://en.wikipedia.org/wiki/Autoregressive_model" rel="nofollow">http://en.wikipedia.org/wiki/Autoregressive_model</a><p><a href="http://en.wikipedia.org/wiki/Newey–West_estimator" rel="nofollow">http://en.wikipedia.org/wiki/Newey–West_estimator</a><p><a href="http://en.wikipedia.org/wiki/Prais-Winsten_transformation" rel="nofollow">http://en.wikipedia.org/wiki/Prais-Winsten_transformation</a>
andrewcookeover 12 years ago
i think the argument being made is that if you sample a continuous signal at a frequency higher than where most of the power in the signal's spectrum lies, then those samples are not independent. so standard statistical tests that assume independent measurements overestimate significance.<p>so if you have two smooth, continuous signals, over a relatively short time (compared to the underlying process that is generating them) then you should simply ask whether they both slope in the same general way (if you like, there's a 50:50 chance that both go up (or down) compared to one going one way and one the other). both sloping in the same way is not terribly significant (50:50 likely by chance). and that doesn't change even if you sample like crazy, and generate lots and lots of points, which appear to show a hugely significant correlation.<p>[edit as i slowly grok this better] more generally, correlation coefficient isn't a good tool to use for comparing signals. it should be used for comparing random samples from populations (a signal is not a population). and i don't think people use it that way these days. so i guess this paper won out.<p>but i may have missed something, or be simply wrong, because this was published 100 years after fourier died, yet when i scanned it i saw nothing that mentioned fourier analysis, which seems like an obvious way (see above) to phrase this (but i may be biased, since i guess fourier analysis boomed once machines existed to compute ffts).
greenyodaover 12 years ago
Note: 63-page PDF of a mathematical paper published in 1926.