> Often referred to as the cornerstone of statistics<p>Well... often referred to as the central theorem of statistics. Each time you say its name. What's central is the theorem, not the limit. It was Polya who first called it that, "zentraler Grenzwertsatz".<p>> Why the Central Limit Theorem Works<p>Well... I don't think that's really an explanation at all of why e^(-x^2/2) is such a privileged function. Why would any distribution converge to a normal distribution?<p>It essentially boils down to the Fourier transform. When you take the Fourier transform of the sample means, if you ignore all but the quadratic terms (there are no linear terms if you centralise to mean 0 and variance 1), you get the exponential limit (1 - t^2/2n)^n. That's the Gaussian, which is its own Fourier transform.<p><a href="https://en.wikipedia.org/wiki/Central_limit_theorem#Proof_of_classical_CLT" rel="nofollow">https://en.wikipedia.org/wiki/Central_limit_theorem#Proof_of...</a><p>In other words, because the Gaussian is its own Fourier transform, sample means converge to the Gaussian.
I always liked this visual representation of the central limit theorem:
<a href="http://blog.vctr.me/posts/central-limit-theorem.html" rel="nofollow">http://blog.vctr.me/posts/central-limit-theorem.html</a>. There is a faster one here (I think written in R): <a href="http://vis.supstat.com/2013/04/bean-machine/" rel="nofollow">http://vis.supstat.com/2013/04/bean-machine/</a><p>These are computer simulations of Galton boxes:
<a href="http://en.wikipedia.org/wiki/Bean_machine" rel="nofollow">http://en.wikipedia.org/wiki/Bean_machine</a>
I think the first half of the article showing how this works with a given sample distribution is pretty good. I don't think it's really doing much to build intuition at the end, though.<p>It's also worth pointing out that there are distributions for which the central limit theorem doesn't hold (e.g. the sum of samples from a Lorentzian distribution will again be Lorentzian, not Gaussian.)
I have a series of basic questions I include in any data science interview, and one is "please describe what the central limit theorem says in simple, high-level terms". It's absolutely amazing how many people who have great credentials can't do this. I get a lot of "any distribution becomes normal when you sample it enough". This is nonsensical and shows a lack of understanding of the theorem.<p>Please, if you claim to know stats, understand what the central limit theorem says. It's a pretty incredible and useful theorem.
My introduction to the central limit theorem was that chained independent random processes tend to result in a Gaussian distribution. This is so general that one is surprised when one finds non-Gaussian distributions (canonical example: the stock market).<p>I attended a lecture by Mandelbrot (shortly before he died) where he spoke at length about this- take a look at stable distributions and the generalized central limit theorem.
To me.. the core idea is that (given one chooses, over and over, from a bunch of independent and identically distributed events.):<p>There are more ways for everything to happen than there are ways for one thing to happen over and over.
An interesting bit of trivia for computer history buffs:<p>Alan Turing independently discovered the Central Limit Theorem while still an undergrad in 1934.