I'm not sure. The author says that adding a common component to two random time series doesn't make them correlated. But that's not true, by construction, at least using any of the simple correlation tests. It's a complicated subject explained in a confusing way.
What the author is trying to explain are the concepts of cointegration and stationarity. A useful introduction here: <a href="http://www.uta.edu/faculty/crowder/papers/drunk%20and%20dog.pdf" rel="nofollow">http://www.uta.edu/faculty/crowder/papers/drunk%20and%20dog....</a>
Something I've repeatedly found useful is that when debugging and you have a conjecture, not only look for evidence that a correlation/causation is present; but also look for evidence that it isn't.<p>Doing a very quick A/B test helps too.
Isn't the author exaggerating in the other direction? There is obviously correlation between the two time series. Sure, who's saying there is causation (as mentioned in the article there can be a third random variable that the first two depended on)? But also, who's to say <i>there's no causation</i>? Is it ok to always remove the correlated part of the two time series? What if that's the interesting part and the explanation you're looking for?
This is called spurious correlation. It's well known in financial / economic time-series analysis. The lesson is that you never measure the correlation between the PRICE LEVELS of products, instead you measure the correlation between the daily/weekly/etc CHANGE IN PRICE LEVELS.<p>A famous example of this:<p>The tale of David Leinweber, which is related in the excellent new book "Quantitative Value," illustrates this point about "stupid data miner tricks." Leinweber sifted through a United Nations CD covering the economic data of 140 countries. He found that butter production in Bangladesh explained 75 percent of the variation of the S&P 500 Index. Not satisfied, he found that if he added a broader category of global dairy products, the correlation would rise to 95 percent. Then he added a third variable, the population of sheep, and found that he had now explained 99 percent of the variation in the S&P 500 for the period 1983-'99.<p>(<a href="http://www.cbsnews.com/news/what-butter-production-means-for-your-portfolio/" rel="nofollow">http://www.cbsnews.com/news/what-butter-production-means-for...</a>)
Does this mean that if I apply this algorithm and that 2 or more time series data sets are still similar that they are in fact correlated? I find this test fascinating.
also: statistical tests on correlation coefficients don't test whether the correlation is "significant" or not --- they only test whether the correlation is reliably different than 0.00<p>So a small correlation (e.g. r=0.10) can still be "statistically significant" at p<0.001 but all this means is that r is reliably different than 0.00 --- it doesn't mean r is big