Anomalies in Vote Counts: Follow-up and Conclusions
Answering questions and providing more analysis
Our first piece, Anomalies in Vote Counts and Their Effects on Election 2020, generated quite a bit of interest. Here, we hope to answer a variety of questions which we received (in public and in private) and to answer concretely the question in which everyone is interested — how likely is it that this would have occurred in the normal course of events?
To begin, we’ll consider some questions and criticisms we received. We then provide some additional analysis which cements our conclusions in the previous report. An updated appendix for this report which contains all of the data and code needed to reproduce the content of both posts can be found here.
The motivating case here was to investigate the large changes which occurred over night in Michigan and Wisconsin. Not only were these surprising and not only did they radically change the state of the race (both in terms of how many electoral votes each candidate was projected to receive, but in terms of the perception of an inevitable Biden victory in Georgia and Pennsylvania once all mail-in ballots were counted), but they occurred during some of the most suspicious windows of time in terms of on-the-ground circumstances.
As our original analysis notes, it remains unclear when and where — and most importantly, precisely why — the vote counting was stopped (or appeared to stop) in certain locations. These were not the only places with suspicious on-the-ground circumstances, but the coincidence of suspicious circumstances with huge, nationally race-altering vote updates raised eyebrows. This encouraged us to look more closely, as we would hope it would encourage any citizens of a free country.
It is instructive to consider what the US State Department and other western organizations consider evidence of electoral fraud in other countries. Expulsion of observers figures prominently among these reports, as do other parallels with this election, such as suspended vote-counting, hard-to-explain late software updates, and even voting methods which differ by candidate. These parallels themselves do not prove fraud, but they suggest where to look. In the simplest sense, they are smoke, not fire, and our work suggests that there is smoke when assessed from a different angle as well.
Consider, for example, this description of the 1988 Mexican federal election, in a description of the autobiography of the candidate who admitted to cheating (or, at the very least, being the beneficiary of cheating on his behalf).
“Initial results from areas around the capital showed that Mr. Salinas was losing badly to the opposition leader Cuauhtémoc Cárdenas. ‘I felt like a bucket of ice water had fallen on me,’ Mr. de la Madrid recalled. ‘I became afraid that the results were similar across the country and that the PRI would lose the presidency.’
Thus began the frantic staging of a fraudulent victory. In his writing of the event, the all-powerful former president chooses his words carefully and describes himself more like a supporting actor than the lead strategist. If he did anything wrong, it was on the advice of his staff, and for the stability of the nation.
On election night 1988, Mr. de la Madrid said, the secretary of the interior advised him that the initial results were running heavily against the PRI. The public demanded returns, Mr. de la Madrid wrote. And rather than giving them, the government lied and said the computer system tabulating the votes had crashed.
This was the advice to Mr. de la Madrid from the president of the PRI: ‘You have to proclaim the triumph of the PRI. It is a tradition that we cannot break without causing great alarm among the citizens.’
As midnight approached, Mr. de la Madrid learned that the leading opposition candidates were preparing to add more confusion to the outcome of the election by each declaring himself the winner. The PRI, he decided, had to pre-empt them, and without any official vote count, the president of the PRI declared his party the winner. A beleaguered Mr. Salinas did not show his face until the next day.
‘The electoral upset was a political earthquake for us,’ Mr. de la Madrid wrote. ‘As in any emergency, we had to act because the problems were rising fast. There was not a moment for great meditation, we needed agility in our response to consolidate the triumph of the PRI.’”
This scenario, in which a candidate is losing substantially, vote-counting is paused (or is claimed to be paused), only for the previously-losing candidate to emerge victorious, matches not only the 2020 US election but also the 2019 elections in Bolivia which ultimately led to former President Evo Morales fleeing the country. From a report by the BBC:
“Hours after polling booths closed on Sunday, the Supreme Electoral Tribunal released the first results of the quick count. With 83.8% of the votes verified, its website showed Mr Morales leading with 45.3%, leaving Mr Mesa in second place with 38.2%.
That result suggested there would be a run-off, prompting celebrations in the campaign camp of Mr Mesa, who jubilantly declared: ‘We’ve made it to the second round!’
But then the website with the quick count stopped being updated for 24 hours, prompting electoral observers from the Organisation of American States (OAS) to express their concern. As counting was suspended, Mr Morales told his supporters he was confident that when votes from rural areas were tallied, there would be no need for a run-off. When the quick count was finally updated on Monday evening, Mr Morales had a lead of 10.12 percentage points - just wide enough to stave off a second round.
The OAS electoral mission called the change ‘drastic and hard to explain.’”
The report then links to this tweet from former US diplomat Michael G. Kozak:
As can be seen in these cases, it is not unreasonable — by the standards of western media and governments — to have suspicion about vote-counting being stopped with little (if any) explanation.
To say the least, there are a great number of unusual and suspicious facts about how these proceedings were conducted. A quantitative examination of the available data, in our view, was therefore warranted.
Once we decided to look at two particular updates in Michigan and Wisconsin, we asked the question: what are the distinguishing features of why these seem weird? And are they actually weird quantitatively?
To begin, it’s important to note that the “spikiness” of the vote spikes is actually a function of a variable we don’t consider in the last post or here, i.e. how clustered successive updates are with respect to time. Since the updates come from all over a state from precincts which are counting at their own pace, whether any two given updates are five milliseconds or five minutes apart is not likely to be a meaningful indicator of the legitimacy of either of them. As such, considering the “slope” between consecutive updates computed as the change in the number of votes in a window of elapsed time is likely to be a prohibitively noisy indicator.
Others have suggested that a better way to assess this would be to simply look at which updates provided the largest share of the final total for the winner. Along these lines, we are told, the updates we highlight no longer look important. This, however, disregards a crucial quality of the metric we chose, namely that, unlike the alternative metric proposed above, no update has to exhibit a high co-extremity score. When searching for anomalies or discrepancies, it is essential that the choice of metrics and statistics do not reflect a necessary reality.
In short, we saw these as they came in, were surprised (as most were), and then developed a fairly simple metric which we thought expressed the relevant characteristics of these vote updates (based on our initial analysis of looking at a few states’ data in different ways), and found that these updates were in fact far more extreme than virtually all others.
Some Notes on the Data
Unfortunately, state-level data was the only complete time-series data available to the general public. Our attempts to reach Edison Research Group were unsuccessful and the only other national time-series data we could find — this one, which was actually broken down by county — didn’t start until about 5:40am EST on 11/4, which excluded the period during which the majority of votes were counted and thus was insufficient for any kind of thorough time-series analysis.
Our goal was, and has remained, to quantitatively assess the characteristics of the “vote spikes” which occurred in the early hours of November 4, 2020, and to present a rigorous quantitative analysis of their characteristics in order to determine whether they are actually odd or merely appear to be so. We are an ad hoc group of researchers who have no advocacy arm and have no political agenda. Our lack of a social media presence is perhaps the most useful indicator of this.
Without wading into the political content of these events, it was our opinion that too much of the discourse consisted either in the making of wild, speculative claims and their (perhaps deserved) outright dismissal. These outright dismissals typically either restating these claims in mocking tones or pointed vaguely to past studies which purportedly reached the conclusion that “voter fraud is exceedingly rare.” Consider, for example, this AP article headline: “AP Explains: Election’s validity intact despite Trump claims.” Opening this, one would expect some sort of defense of either the election proceedings themselves or a rebuttal to specific claims, and yet it only points to failed lawsuits and says that, “State officials from both parties… have also stated that the 2020 election went well” — a highly questionable claim which, even if believed, is not itself very convincing. It is hard to believe that a denial of misconduct by officials would be treated as so dispositive in other contexts.
This sort of shoddy “defense” of election proceedings does little to assuage the public. We decided to look (again, motivated by a small set of specific cases) and wished to inform readers of what we found. Given the extraordinarily charged nature of the subject, having a rational and measured conversation about statistical artifacts is inherently difficult. We hoped to move the conversation in that direction by producing a report which, to the extent possible, considered only the math and left out the politics.
Interpreting the Odds
The most critical question is, without a doubt, “How likely is it that this would occur in the normal course of events?” There are many angles from which we can attempt to answer this question, since the definition of “this” in the question has many possible values.
In order to answer the question, it is necessary to state the null hypothesis appropriately. In particular, it is clear from a cursory examination of election returns data that votes counted much later in the process overwhelmingly come in very small updates. Since this data set extends over about two weeks, we risk overstatement if there is a correlation between time and co-extremity.
Examining the data set, we see that there is indeed a small negative correlation (with a low p-value) between time and co-extremity. Restricted to updates which occurred before midnight EST on (Thursday) November 5th, however, there is no correlation.
Here is a graph which plots the co-extremity of each of the 6,775 vote updates which occurred prior to midnight (EST) on November 5th:
Fig.1. The X-axis is the time, in EST, when each vote update arrived and the Y-axis is the co-extremity score for each vote update.
Out of the 6,775 updates shown here, only 196 are from Michigan, Wisconsin, and Georgia between 1:30 and 6:35am EST. As the reader can see, most of the red, green, and yellow dots plotted above (representing vote updates in Michigan, Wisconsin, and Georgia, respectively) are far less co-extreme than the four we have highlighted as suspicious — including those which occurred overnight. We can now ask ourselves:
What are the odds that the two most co-extreme updates would be among these 196?
What are the odds that three of the four most co-extreme updates would be among these 196?
What are the odds that four of the seven most co-extreme updates would be among these 196?
If we assumed we were sampling with replacement, these values are:
Note further that these numbers do not require assessing the unlikelihood of any individual update being this co-extreme. They also do not require, contra the claims of some commentators, that we need to call the final results of any particular states or counties into question. This result merely calls into question the likelihood of vote updates from three states during a five hour, five minute window (constituting roughly 2.89% of all of the vote updates prior to midnight EST on November 5th) being so concentrated among the most co-extreme vote updates across the entire country.
Conclusion and Further Analysis
Various low-quality responses and “debunkings” of our report are generally along the lines of, “This was big cities counting their votes.” While attributing these updates specifically to cities in a report containing only state-level data likely indicates a failure to have even opened the report, it is also worth noting that there are a great number of cities in America, most of which are not in states which have unusually co-extreme updates. There are also several other “battleground” states and several other states where there are one or a small number of very Democratic cities and a largely Republican rest of the state (e.g. Texas, Oregon, Minnesota, Illinois). In particular, some have attributed these to laws regulating when mail-in ballots are counted. Pennsylvania, however, also had a similar restriction, and did not have any vote updates which our report considered suspicious. Pennsylvania, also a swing state which is comparable to Michigan, Wisconsin, and even Georgia along a number of demographic and political dimensions, did not have any particularly co-extreme updates.
Explaining away the results we have highlighted thus requires, at a minimum, explaining distinguishing features of Michigan, Wisconsin, and Georgia which are not found in any other states which are demographically or politically comparable. More importantly, however, it requires explaining the distinguishing features of these particular updates with respect to all of the other updates in each of these states which explain why they are so aberrant. It also requires explaining why these decisive updates all occurred overnight, during the most contested period of vote-tabulation.
Thus far, most supposed “debunkings” have simply claimed that these were mail-in ballots from large cities — attributes not at all unique to these updates. Indeed, the few in the media who have noted the existence of this report have tended to comment on the individuals sharing it and do not appear to have bothered to address the content.
Continuing our analysis from our original report cements our confidence in the conclusions we drew there — namely, that Biden’s victory in Michigan, Wisconsin, and Georgia relied on very unusual individual vote updates, which indeed were so extreme that, were they all only at the 99th percentile of co-extremity, Joe Biden would have lost all three states. This does not prove any particular affirmative case of fraud so much as to point out that what is perhaps the most contested window of time also contained a small number of critically Biden-favoring updates which have measurably unusual characteristics.
It remains our belief that a maximally detailed forensic audit of all of the vote-counting procedures and the ballots themselves, conducted in the public view, remains the only possible way to address suspicions surrounding these circumstances.
 Reports of observers being restricted or ejected figures prominently in virtually every report from western media or governments which casts suspicion on an election in Latin America, Eastern Europe, the Middle East, or elsewhere. This report, from the Congressional Research Service, mentions how, “Since no independent international observers were present for Iran’s elections, it is difficult to ascertain the extent of alleged vote rigging or election violations that may have taken place. The expulsion of most foreign journalists from Iran and the government’s interruption of mobile and internet communication have further complicated efforts to gain a clear picture of the events surrounding the election and its aftermath.” This report on the 2004 Ukrainian elections, which sparked the famous “Orange Revolution,” lists as one of its “Examples of the most egregious, widely observed and reported examples of election-day fraud,” that, “Observers from Our Ukraine and other opposition groups were expelled from most polling stations in eastern Ukraine on Election Day. For example, in Territorial Election Commission (TEC) district number 42 in Donetsk oblast, Our Ukraine observers were kicked out of all but a few polling stations.” Coincidentally, it also makes mention of “Illegal Use of Absentee Ballots,” claiming that “Massive electoral fraud was committed through the illegal use of absentee voter certificates.” This is very similar to circumstances in places like Detroit and Atlanta, see e.g.
 As mentioned elsewhere, this article on the 2019 Bolivian election and this article on the 1988 Mexican election are useful primers. Readers may also find interesting this admission of election fraud on behalf of future President Lyndon B. Johnson in 1948 and this article, from 1976, about “political machines” in Chicago and their likely decisive effect on the US Presidential Election in 1960. These involve a mix of (ostensible) pauses in vote-counting along with decisive batches of votes being counted after most (if not all) other votes have already been counted.
 This detailed report on the 2019 Bolivian Elections also describes a last-minute software update which appeared to violate security standards. This parallels a report from this year about a late, poorly-understood software update to voting Machines in Spalding County, Georgia.
 This piece in The Guardian describes how supporters of the opposition candidate in the 2009 Iranian elections were encouraged to vote in different locations from those of the incumbent President Mahmoud Ahmadinejad.
 The Pearson correlation coefficient is roughly -7.72e-2, with a p-value of roughly 2.63e-13.
 The Pearson correlation coefficient is roughly -8.60e-3, with a p-value of roughly 4.79e-1.
 This calculation can be found here.
 This calculation can be found here.
 This calculation can be found here.
 This is if the ratios are held constant. This is likely a more plausible alternate scenario than holding the margins constant, as constant margins with a smaller ratio would imply that a very large number of votes had been withheld from equal candidate in equal measure, an action which would be fraudulent but have no impact on the relevant metric. It is worth noting that this is purely speculative and meant principally as an indication of just how unusual these vote updates were. These should not be interpreted as a definitive claim that the results were legitimately these values.
 This would, at a minimum, involve hand recounts which are ensured to match voter rolls and a full audit of all voting machines. It is our view that conducting this entirely in public view (redacting only names) would be the best way to resolve existing disputes about the accuracy of the results. It is also our view that ballots which were subject to any kind of “adjudication” should have their contents made viewable to and searchable by the public.