This might be too off-topic, but just kill it if you think it is. Otherwise, here goes:<p>I have a question about regression to the mean.<p>Suppose you have a set of pairs (a,b) corresponding to students in a class. a = the student's score on the first midterm, b = score on second midterm.<p>If you plot the pairs with a on x-axis, b on y-axis, then get the least-squares line, you have an upward sloping line.<p>The line slope should be less than 1, indicating regression to the mean.<p>If you plot b on x-axis, a on y-axis, the slope is necessarily now greater than 1. But I fail to see what has changed in the analysis -- a and b are both just supposed to be samples from the same distribution, right?<p>This has been driving me crazy, so I'd love some help.<p>Thank you!
Don't do a least squares line. That doesn't help. In the first plot, you'll see that in general:<p>x < mean => y > x<p>x > mean => y < x<p>If the scores are normalized. Regression to the mean is that most people move towards the mean in subsequent games/attempts/whatever.<p><i>But I fail to see what has changed in the analysis -- a and b are both just supposed to be samples from the same distribution, right?</i><p>Not at all. b is not independent of a, thats the whole point of regression to the mean. If you take ordered pairs where there is no connection between a and b, then you won't get any regression to the mean, you'll get points essentially randomly placed on the plane.
Regression to the mean does not imply that the first slope should be less than one.<p>If for some reason only the above-average students regressed, then the slope would be <1. But regression to the mean also affects the scores of students who started below average; as a group we should expect them to regress <i>upward</i> toward the mean. Combine the two groups, and the effects exactly cancel out, leaving a slope of 1.<p>(Since you say the slope "should be" one, I assume the scores are normalized somehow so that the mean score for exam A is the same as the mean for exam B.)