I don't see the problem. The R-squared is 0.01 for blue/red predicting individual votes, because both of the states in question are really just different shades of purple. The R-squared is 1.00 for predicting the total vote share and which party wins the state, because of course the red/blue binary completely determines those.
> The two states are much different in their politics!<p>Are they? Sounds like they're both swing states, pretty close to 50-50, so which state you're from doesn't have a big effect on what your politics are likely to be. Which is exactly what the R^2 tells us. Where's the paradox?
My statisitcs are a little rusty, so I might be off here. Someone correct me if I have this wrong. R^2 = 1 would be every voter in one state votes blue and every voter in the other votes red. R^2 = 0 would mean both states are exactly even between red and blue. The states are a lot closer to that. Again, my statistics are rusty so I'm no sure if this next part is valid, but sqaure root of .01 is .1, which doesn't seem like such a bad representation of the situation.
part of this has to do with the fact that our intuitive sense of effect sizes don't really use proportions and we subconsciously start including sample sizes.<p>If a state had an election with millions of voters and got a 55-45 result, it would be a decisive landslide victory; If a elementary school classroom had an election 20 voters had got a 55-45 split, it's be the narrowest possible margin of victory.<p>Most would likely say that the effect in the former 'feels' much larger even though proportions are identical, which suggests that under the hood we're factoring in sample size to our intuition about effect sizes (probably something chi-square-ish).<p>The result is that the framing of the problem can change our sense of how big the effects are. When we hear that these are state-level elections, we think it's a huge effect and feel that we should be able to do reverse inference. If it was reframed as an election on a much smaller sample, the paradox disappears and you'd say "of course you wouldn't be able to reverse that inference"
R2 is more simply explained as the share of the error variance explained by the model out of the share of the error explained by the best guess, which is, in this case 0.5.<p>Guessing 0.5 will have you wrong wrong by 0.5 100% of the time. SST is 25 for a 100 sample example.<p>Guessing 0.55 for the 0.55 state will have you wrong by 0.45 55% of the time and 0.55 45% of the time for the other. SSE is 24.75<p>1- 24.75 / 25 = 0.01<p>Looking at it this way it’s not too hard to see why the R2 is bad. It barely explains any more difference in the individual behavior than the basic guess.<p>R2 is not a great metric for percentages or classification problems like this.
The math is correct, but I think the model used is not correct since it doesn't reflect that the variable s is dichotomous so rather a mixed model should be used. If we continue thinking that s is continuous we could think of this example: s=state is encoded as a continuous variable between -1 and 1 here people change state frequently and -1 reflects the person will vote in the blue state with probability 1 and s=1 that the person will vote in the red state with probability 1 while s=0 means that the person has the same probability of voting in the red or blue states. When s is near zero the model is not able to predict the preferences of the voter and this is the reason of the low predictive power of this model for a continuous s. The extreme cases s=-1 or s=1 could be rare for populations that move from one state to the other frequently so the initial intuition is misleaded to this paradox.
This comes up a lot in genetics. One crowd says "polygenic scores for education don't tell you much, because look how low the R-squared is!" Another crowd (including me) says "polygenic scores for education are a big deal, because look how big the effect size is!"
What paradox? People don't vote a particular way because they live in a state. The logic here would imply "Welp, I live in Kentucky, so I guess Red?" would be the expected mode at the voting booth.
There are two ways to resolve the paradox<p>1. if you insist on using the r-squared (i.e., a linear regression measure), then properly center and normalize your data, and model what you actually predict: the difference between the baseline (0.5) and the probability to vote for party 0 or party 1. If you model the outcomes as 0/1 without this, then you are using a model made for gaussian variables on what should be a logistic regression
2. if you can live with something that more accurately captures the idea of "explanatory power", you can use a GLM (logistic link function), do a logistic regression, and then use the log odds or another measure.<p>In both cases, the variance explained by the state that you are in is 1, because of course it is, that's how the thought experiment is constructed - p(vote for party 1)=0.5+ \delta(state).<p>"Paradoxes" like this are often interesting in the sense that they point to the math being the wrong math or you using it wrong, but instead people tend to assume that they are obviously understanding things correctly so it must be some weird property of the world (which then sometimes is used to construct some faulty conclusions as in some of the cited papers)
From (1) On the other hand, if the variation between the group means and the grand mean is small, and the variation within groups is large, this suggests there are no real differences in the group means i.e. the variations we observe is just sampling variation.<p>The above is in the context of analysis of variance. In our example the means in each state are 0.55 and 0.45 and the total mean is 0.50 so first summand is small but the variances in the red and blue states are both 0.247, large summand, so the variations we observe are just sampling variations. Hence the state factor is not important and that explains the low R^2 value. Note that in each state the predicted value for the model is the group mean of that group. So analysis of variance explains that the OP result is not a paradox or something strange.<p><a href="https://saestatsteaching.tech/analysis-of-variance" rel="nofollow">https://saestatsteaching.tech/analysis-of-variance</a>
Isn’t the phenomenon just related to the way the vote options are encoded? Use different methods and you will see different R^2 results.
Aren’t the votes represented artificially on a continuous domain for the R^2 calculation but the actual values are categorical values?
As other commenters have pointed out in one way or another, the problem seems to actually be that this simplistic model of voter choice can't capture all the structure of the real world that humans can quickly infer from the setup. Things like: state elections have millions of voters, 55/45 is actually a decisive, not a narrow win etc.<p>In a generic setup, imagine you have a binary classifier that outputs probabilities in the .45-.55 range - likely it won't be a really strong classifier. You would ideally like polarized predictions, not values around .5.<p>Come to think of it, could this be an issue of non-ergodicity too ( hope I'm using the term right)? i.e. state level prior is not that informative wrt individual vote?
If voters are split 60-40 on an issue, that doesn't mean that the odds are 60-40.<p>You should instead be asking, what are the odds that that X voters could change their vote.
The states (and even more so the sub-state regions) really are much more different than what you would think just looking at R vs D. A Democrat in a city the Democrats win 90-10 is likely a very different Democrat from one where they lose 60-40.
Nothing but endless cloudfare captchas here for me.<p>Removing cookies for the domain doesn't help, because (doh) I've never visited it before.