I've been looking into Stein's paradox last year more deeply as I could not wrap my head around it. The baseball example is easy. It is quite intuitive that some of those who scored well probably just did so by accident, and are actually somewhat less skilled (and vice versa for the lowest scorers). However, the Stein paradox is deeper than that. According to Wikipedia, it apparently also applies to unrelated variables, for example the population of Ulan Bator, the temperature on Mars, and the yearly chocolate consumption of Switzerland. This goes counter all my intuition and I found the following weakness in the theory behind Stein's paradox: the improved estimators all seem to depend on things like the mean or the variance among the included variables. For example, if you estimate Ulan Bator to have 1 million inhabitants and the temperature on Mars to be 200 Kelvin, you would adjust the former estimate a little downwards and the latter estimate a little upwards (towards the common mean). However, this implicitely assumes that the population and the temperature have been drawn from a distribution whose mean exists. My guess, that this is not the case. Obviously, you can always calculate a sample mean and a sample variance, but they might be meaningless if the sample stems from a distribution such as Cauchy.
The "paradox" is that if player's batting averages, when high, get worse predictions and low averages are predicted to get higher, then these predictions work better than just guessing they will stay the same.<p>This doesn't seem like a paradox to me. Rather it seems kind of obvious. If the statistics are anything like a random walk then random walk theory (usually revisiting the starting point) would predict this.
In the case of the baseball example, at least, isn't the increased accuracy of the Stein estimator a result of incorporating a good Bayesian prior into the "observed average" result of the individual players -- that prior being the batting average of a "typical" player (ie, the average of the averages)?
I have seen several mentions of James-Stein estimator being almost an empirical Bayesian estimator, but this article made it more clearer. Thanks for sharing.