I liked this but as other commenters have pointed out it's purely <i>binomial</i> at this stage. The great thing about the central limit theorem is that it is <i>more general</i> than just the limiting Binomial case.<p>So, there's this thing called the cumulant-generating function. It's pretty much defined for any random variable X. If you want to get technical it is the logarithm of the Fourier transform of a probability density function f(x). You're on HN so you probably know the first two jargon words, "logarithm" and "Fourier transform". A "probability density" just means that f(x) dx is the probability for X to be in the interval (x, x + dx). The Fourier transform puts us into a "frequency space" indexed by some variable k, so we can write the CGF as some function c(k), or in other words:<p><pre><code> c(k) = ln[ ∫ dx f(x) exp(i k x) ]
</code></pre>
So for the "step left/right" variable which is -1 with probability 1/2 and +1 with probability 1/2, c(k) = ln[cos(k)]. (It can get a little messy when you ask what happens when cos(k) crosses 0 etc, but this function is infinitely-often differentiable on a disc centered at 0 which is all that we need.)<p>It also turns out that since the sum of all the probability ∫ dx f(x) = 1, you can just evaluate this for k=0 as c(0) = ln[ 1 ] = 0. The cumulants are derivatives evaluated at k=0:<p><pre><code> c'(0) = i E[X] = i μ (where μ is the "expectation")
c''(0) = - (E[X²] - E[X]²) = -σ² (where σ² is the "variance")
c'''(0) = -i E[(x - μ)³] = -i σ³ γ (where γ is the "skewness")
</code></pre>
It is not very hard to prove that if I have a sum of two random variables X + Y = Z, then their CGFs add like cz(k) = cx(k) + cy(k). This is the Fundamental Awesomeness of Cumulants: cumulants are pseudo-linear; they're linear in <i>independent</i> random variables.<p>It is also not hard to prove that if I have a scalar multiplication U = X / n, then cu(k) = cx(k / n). Combining these together, the mean M = (X1 + X2 + ... + Xn) / n of n identical and independent random variables looks like:<p><pre><code> cm(k) = n c(k / n)
</code></pre>
Now if you know calculus, you know the rest. Taylor-expand around k = 0 to find:<p><pre><code> cm(k) = i μ k − k²/2 * (σ²/n) − i k³/6 (σ³ γ / n²) + ...
</code></pre>
We see a <i>geometric attenuation</i> in this series, we keep dividing new terms by n. If we drop terms from c'' on we get a constant, M = μ. That's boring, so we keep one more term, to find:<p><pre><code> cm(k) ~= i E[X] − (Var[X] / n) k² / 2
</code></pre>
This is extra-good if f(x) is symmetric about the mean (so that the skewness vanishes), but it is also pretty good even if the distribution is skewed.<p>We can now invert all of the steps to return back to a probability density: you exponentiate, so you get exp(i μ k) exp(- k² σ²/ 2 n), then you inverse-Fourier-transform, which transforms the exp(i μ x) into a frequency offset x → x − μ and which transforms the Gaussian into another Gaussian, so you get a Gaussian centered at x = μ.<p>In other words you get approximately a normal distribution with mean E[X] and variance Var[X] / n, with the error given by convolutions of higher-order terms, and the first correction disappearing when the distribution is symmetric about μ (or otherwise non-skew).