科技回声

4 条评论

tzs超过 12 年前

A recent discussion on /r/statistics: <a href="http://www.reddit.com/r/statistics/comments/11ydmt/20082012_election_anomalies_results_analysis_and/" rel="nofollow">http://www.reddit.com/r/statistics/comments/11ydmt/20082012_...</a>And this: <a href="http://www.reddit.com/r/statistics/comments/123pt7/would_rstatistics_care_to_critique_the/" rel="nofollow">http://www.reddit.com/r/statistics/comments/123pt7/would_rst...</a>And this, from 8 months ago: <a href="http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_you_debunk_this_some_people_with/" rel="nofollow">http://www.reddit.com/r/AskReddit/comments/qb9ea/reddit_can_...</a>The result seems to be that the more one knows about statistics, the less convincing the case is for election manipulation. All the analysis actually seems to show is that Romney did better in precincts with a large number of votes cast, which is pretty much what we'd expect, as those are are more likely to be precincts in denser areas.By sorting the precincts by number of votes cast, and then plotting percentage split of the cumulative votes across that sorted list, they are pretty much forcing the general shape of the curves that they get. The smaller rural precincts are (1) more likely to go for other candidates, and (2) are going to show much more variation, thus making it almost certain the split on the left side of the graphs will be far from the split on the right side of the graph.People also seem to be making much of the way when Romney's curve moves up on the graph, one or more of the other curves move down by an amount that EXACTLY BALANCES Romney's gain--and so it must have been vote flipping. No, it is because they are showing the percentage split, which BY DEFINITION must add to 100, and so any change in one curve must be balanced exactly by the net changes in the other curves.Another thing worth noting is that exit polls agreed well with the reported results, as did news organization forecasts based on early returns, as did projections based on pre-vote polling. If there were significant fraud, it would throw those off (unless the fraudsters were managed to ALSO rig the polls and projections...).EDIT: here's an example. Given a set of precincts where the number of votes cast match those of 2012 Arizona GOP primary, and where the distribution of votes in large precincts is 44.6% Romney, 31.4% Santorum, 16.0% Gingrich, and 8.0% Paul, and the distribution is 29.6%, 36.4%, 21.0%, 13.0% in medium precincts, and 14.6%, 38.4%, 27.5%, 19.5% in small precincts, with large meaning had 150 or more votes, small meaning had less than 50 votes, and medium being anything else, here's what the percentile distribution of cumulative vote total sorted by precinct size curve looks like for a simulated election: <a href="http://imgur.com/s4SrE" rel="nofollow">http://imgur.com/s4SrE</a>My numbers there aren't meant to reflect actual Arizona numbers. They are just meant to illustrate the kind of curve we expect if there is a distribution difference that correlates with precinct size.Romney is blue, Santorum is yellow, Gingrich is green, and Paul is red. Note the similarity to the actual Arizona curve. I expect that if someone could dig up polling data from each precinct from before the vote, and used that to control the vote distribution for each precinct, the match would be very close.If anyone wants to play around with this, here's some quick and dirty Python code:<pre><code> #!/usr/bin/python import random import matplotlib.pyplot as plt precinct_size = [1, 1, 1, 3, 5, 7, 8, 9, 10, 12, 18, 19, 21, 26, 28, 29, 30, 30, 34, 41, 46, 50, 56, 57, 57, 58, 58, 61, 68, 69, 72, 78, 79, 85, 88, 94, 99, 100, 103, 109, 109, 120, 126, 126, 129, 132, 133, 133, 136, 139, 139, 141, 147, 152, 157, 158, 162, 162, 165, 166, 169, 172, 173, 175, 177, 177, 179, 181, 192, 201, 224, 231, 235, 238, 246, 249, 249, 251, 258, 264, 268, 270, 272, 276, 277, 281, 281, 293, 315, 322, 327, 333, 334, 337, 346, 348, 349, 350, 363, 366, 367, 369, 370, 374, 374, 384, 385, 386, 387, 394, 397, 405, 407, 412, 413, 419, 420, 421, 425, 429, 434, 438, 439, 449, 467, 474, 483, 483, 496, 502, 505, 507, 508, 513, 518, 525, 526, 538, 545, 548, 555, 560, 571, 577, 581, 583, 584, 606, 606, 612, 620, 620, 625, 635, 641, 646, 647, 650, 652, 662, 663, 666, 669, 673, 674, 710, 711, 721, 728, 737, 744, 745, 747, 747, 780, 786, 787, 790, 791, 818, 824, 833, 834, 878, 899, 903, 927, 928, 949, 961, 1002, 1031, 1059, 1070, 1133, 1587] vote_dist = [.446, .314, .160, .080] # romney, santorum, gingrich, paul vote_dist2 = [.296, .364, .210, .130] vote_dist3 = [.146, .384, .275, .195] def vote(d): r = random.random() s = 0 for v in range(len(d)): s += d[v] if s >= r: return v def trial(): total = [0, 0, 0, 0] x = [] romney = [] santorum = [] gingrich = [] paul = [] i = 1 vt = 0 for s in precinct_size: for voter in range(s): if s < 50: v = vote(vote_dist3) elif s < 150: v = vote(vote_dist2) else: v = vote(vote_dist) total[v] += 1 vt += 1 x.append(float(i)) i += 1 romney.append(float(total[0])/vt) santorum.append(float(total[1])/vt) gingrich.append(float(total[2])/vt) paul.append(float(total[3])/vt) plt.plot(x, paul, 'ro') plt.plot(x, gingrich, 'go') plt.plot(x, santorum, 'yo') plt.plot(x, romney, 'bo') plt.show() trial() </code></pre> EDIT 2: I'm not actually sure that the precinct sizes match Arizona. I got the numbers from a spreadsheet attached to one of the news stories about this. However, the total number of votes is only about 75k, which is much less than the correct number for Arizona.

评论 #4710680 未加载

polemic超过 12 年前

The take home message:> "...highly anomalous election results indicate a widespread, systematic exchange of votes favoring one candidate"> "Mitt Romney, based on our analysis, should have (statistically) gotten third rank in Iowa’s election (as opposed to second); second rank in New Hampshire (as opposed to the first rank), and so on, resulting most likely to a brokered convention at the Republican National Convention in Tampa, FL."Statistics is awesome, although I suspect that public/media knowledge will mean this is brushed aside.

yk超过 12 年前

The main problem of this paper seems to be 'publication bias.' I mean by this, given a number n of statisticians who look at a certain fair election. Each using n novel method. Then we expect ß n papers from it, given that ß is the chance of falsely rejecting the election as fair. Since each statistician uses a number of tests, which may or may not correlate with each other, it is very much impossible to determine sufficient level of certainty, without committing first to the statistical tests used.

cantastoria超过 12 年前

Hacker News?

评论 #4710613 未加载

4 条评论

tzs超过 12 年前

评论 #4710680 未加载

polemic超过 12 年前

yk超过 12 年前

cantastoria超过 12 年前

Hacker News?

评论 #4710613 未加载

Statistical Analysis of Election Manipulation in Republican Primaries

4 条评论

Statistical Analysis of Election Manipulation in Republican Primaries

4 条评论