A recent discussion on /r/statistics: <a href="http://www.reddit.com/r/statistics/comments/11ydmt/20082012_election_anomalies_results_analysis_and/" rel="nofollow">http://www.reddit.com/r/statistics/comments/11ydmt/20082012_...</a><p>And this: <a href="http://www.reddit.com/r/statistics/comments/123pt7/would_rstatistics_care_to_critique_the/" rel="nofollow">http://www.reddit.com/r/statistics/comments/123pt7/would_rst...</a><p>And this, from 8 months ago: <a href="http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_you_debunk_this_some_people_with/" rel="nofollow">http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_...</a><p>The result seems to be that the more one knows about statistics, the less convincing the case is for election manipulation. All the analysis actually seems to show is that Romney did better in precincts with a large number of votes cast, which is pretty much what we'd expect, as those are are more likely to be precincts in denser areas.<p>By sorting the precincts by number of votes cast, and then plotting percentage split of the cumulative votes across that sorted list, they are pretty much forcing the general shape of the curves that they get. The smaller rural precincts are (1) more likely to go for other candidates, and (2) are going to show much more variation, thus making it almost certain the split on the left side of the graphs will be far from the split on the right side of the graph.<p>People also seem to be making much of the way when Romney's curve moves up on the graph, one or more of the other curves move down by an amount that EXACTLY BALANCES Romney's gain--and so it must have been vote flipping. No, it is because they are showing the percentage split, which BY DEFINITION must add to 100, and so any change in one curve must be balanced exactly by the net changes in the other curves.<p>Another thing worth noting is that exit polls agreed well with the reported results, as did news organization forecasts based on early returns, as did projections based on pre-vote polling. If there were significant fraud, it would throw those off (unless the fraudsters were managed to ALSO rig the polls and projections...).<p>EDIT: here's an example. Given a set of precincts where the number of votes cast match those of 2012 Arizona GOP primary, and where the distribution of votes in large precincts is 44.6% Romney, 31.4% Santorum, 16.0% Gingrich, and 8.0% Paul, and the distribution is 29.6%, 36.4%, 21.0%, 13.0% in medium precincts, and 14.6%, 38.4%, 27.5%, 19.5% in small precincts, with large meaning had 150 or more votes, small meaning had less than 50 votes, and medium being anything else, here's what the percentile distribution of cumulative vote total sorted by precinct size curve looks like for a simulated election: <a href="http://imgur.com/s4SrE" rel="nofollow">http://imgur.com/s4SrE</a><p>My numbers there aren't meant to reflect actual Arizona numbers. They are just meant to illustrate the kind of curve we expect if there is a distribution difference that correlates with precinct size.<p>Romney is blue, Santorum is yellow, Gingrich is green, and Paul is red. Note the similarity to the actual Arizona curve. I expect that if someone could dig up polling data from each precinct from before the vote, and used that to control the vote distribution for each precinct, the match would be very close.<p>If anyone wants to play around with this, here's some quick and dirty Python code:<p><pre><code> #!/usr/bin/python
import random
import matplotlib.pyplot as plt
precinct_size = [1, 1, 1, 3, 5, 7, 8, 9, 10, 12, 18, 19, 21, 26, 28, 29, 30, 30, 34, 41,
46, 50, 56, 57, 57, 58, 58, 61, 68, 69, 72, 78, 79, 85, 88, 94, 99, 100,
103, 109, 109, 120, 126, 126, 129, 132, 133, 133, 136, 139, 139, 141, 147, 152,
157, 158, 162, 162, 165, 166, 169, 172, 173, 175, 177, 177, 179, 181, 192,
201, 224, 231, 235, 238, 246, 249, 249, 251, 258, 264, 268, 270, 272, 276,
277, 281, 281, 293, 315, 322, 327, 333, 334, 337, 346, 348, 349, 350, 363, 366,
367, 369, 370, 374, 374, 384, 385, 386, 387, 394, 397, 405, 407, 412, 413, 419, 420,
421, 425, 429, 434, 438, 439, 449, 467, 474, 483, 483, 496, 502, 505, 507, 508, 513,
518, 525, 526, 538, 545, 548, 555, 560, 571, 577, 581, 583, 584, 606, 606, 612, 620,
620, 625, 635, 641, 646, 647, 650, 652, 662, 663, 666, 669, 673, 674, 710, 711, 721, 728,
737, 744, 745, 747, 747, 780, 786, 787, 790, 791, 818, 824, 833, 834, 878, 899,
903, 927, 928, 949, 961, 1002, 1031, 1059, 1070, 1133, 1587]
vote_dist = [.446, .314, .160, .080] # romney, santorum, gingrich, paul
vote_dist2 = [.296, .364, .210, .130]
vote_dist3 = [.146, .384, .275, .195]
def vote(d):
r = random.random()
s = 0
for v in range(len(d)):
s += d[v]
if s >= r:
return v
def trial():
total = [0, 0, 0, 0]
x = []
romney = []
santorum = []
gingrich = []
paul = []
i = 1
vt = 0
for s in precinct_size:
for voter in range(s):
if s < 50:
v = vote(vote_dist3)
elif s < 150:
v = vote(vote_dist2)
else:
v = vote(vote_dist)
total[v] += 1
vt += 1
x.append(float(i))
i += 1
romney.append(float(total[0])/vt)
santorum.append(float(total[1])/vt)
gingrich.append(float(total[2])/vt)
paul.append(float(total[3])/vt)
plt.plot(x, paul, 'ro')
plt.plot(x, gingrich, 'go')
plt.plot(x, santorum, 'yo')
plt.plot(x, romney, 'bo')
plt.show()
trial()
</code></pre>
EDIT 2: I'm not actually sure that the precinct sizes match Arizona. I got the numbers from a spreadsheet attached to one of the news stories about this. However, the total number of votes is only about 75k, which is much less than the correct number for Arizona.