63 Comments

You did not answer the most burning question from the first post, who are you and what are your credentials to even detailed data analysis? Some have said you are anonymous for safety reasons. Sorry, that is not good enough. Your speculation and others have put the lives of hundreds, if not thousands, of public servants, whose names are known, into real and imminent danger. I have immediate family members ready to go to open armed rebellion and civil war. I won't give this, or any other "analysis" ANY credibility unless it is publicly signed by the authors

Expand full comment

Their credibility rests in the report. I am guessing that you're incapable of understanding it, despite its very accessible nature.

Someone below called you a "joke of a human". This may be true. What is definitely true is that your attempts to discredit the analysis via logical fallacies is a joke

Expand full comment

What ever happened to civil discourse? For the record your ad hominem argument adds nothing substantive to the conversation. For actual facts, I'd point you to the results of the Republican driven partisan recount in Arizona.

Expand full comment

Lol.

An ad hominum would be if "I" called you a "joke of a human". I don't know, it may be true (as I stated) and is as likely as anything else, given you lack of grasp of logic or apparently math and your pathetic resort to credentialism.

Go away back to mommy.

Expand full comment

Give me your real name and email address and I'd be happy to have a conversation about math, science, and logic credentials. SMH. Let me ask you, if I could provide evidence that there was no election fraud in 2020 that could have overturned the results, would you even want to see it or could you believe it? By the way, the inference that you made that I might be a "joke of a human" was an ad hominem attack. Also, "go back to mommy" is ad hominem, FWIW.

Expand full comment

"provide evidence that there was no election fraud in 2020 that could have overturned the results"

Thus, you can apparently provide evidence of actual election fraud, but that it was insufficient to have overturned the results.

Reality check: This has been true in all or most elections since, well, forever. This is not contentious, or remotely relevant.

If you intended to say, "I could provide evidence that there was no election fraud in 2020" without the modifier, I'd have pointed out a logic fail.

I actually don't care about your credentials or your demands of others to see theirs. If you had a cogent argument to rebut the posts, you'd have used it. You didn't so I conclude that you haven't.

"Go away back to mommy." was intended to be dismissive of your person and intellect. It was intentional, for the avoidance of doubt.

/end

Expand full comment

I could provide evidence, but you wouldn't accept it anyway. The Attorney General of the USA, picked by Trump, said there was no evidence. Dozens of court cases were brought and lost or dismissed for lack of evidence. Also, you really don't think dozens of reporters were digging into this? Being the first media outlet to definitively prove massive fraud would be a Pulitzer Prize winning article! And, yes, there are still conservative newspapers out there. It has been disproven a 100 ways, so what good would it do to point you to all that evidence?

Expand full comment

The authors name is irrelevant.

Speculation? Really? I mean, you do know that eyewitnesses and sworn affidavits are evidence? Ok, glad we cleared that up.

Let's discuss threats, they are wrong, and I do not condone in an shape form or fashion.

That said...

What about the threats and attacks made against the hundreds and hundred of sworn affidavits of eye-witness testimonies to outright election fraud and massive irregularities?

Or what about attorney's who were supporting the President's Constitutional rights to challenge; that had to step aside for their career-sake and family's safety etc...?

What about the radical Marxist left (ANTIFA or BLM) that verbally and physically attack innocent Americans?

Where's the outrage and voices to these examples of injustice?

Expand full comment

Well then apparently you weren't bright enough to know what was happening just by watching the live coverage on every news station on the 3rd. Also many people sent in videos of exactly what this man speaks of even before Trump or anyone ever said a word. The democrat and republican voters were up front and center in counting facilities and polling facilities recording live coverage of what was happening so if that doesn't clue you in nothing will.

Expand full comment

You're a complete joke of a human.

Expand full comment

LOL, that's a super helpful comment. Thanks for sharing.

Expand full comment

Yeah, shame your mom didn't do the planet a favor and visit a Planned Parenthood.

Expand full comment

Well, my family is pro-life, so...

Expand full comment

Glad you stated, twice, that your analysis "does not prove fraud", a point that a number of us were trying to make. I do hope someone is successful in being able to conduct the detailed forensic analysis just to end all of this speculation.

Expand full comment

Looks like Arizona has and Georgia is caught red handed which the tweet above references. I view the data the same way. But why are so many politicians acting like forensic audits are crazy? Why hasn’t America done these before? We spend more money on Jordan’s wall than vetting our own elections. Why? There’s a reason why Obama and Holder forced Dominion into a 50% shareholder of US election software immediately after winning election.

Expand full comment

Looks like someone is gullible. The following is in the news.

"The Republican overseeing the controversial GOP-backed election audit in Arizona has reportedly been banned from entering the building where the recount process is ongoing, after he shared some data with experts that showed the results match the officially certified numbers in Maricopa County."

Expand full comment

This very long piece stumbles at the outset. "(a) How likely is it that this would have occurred in the normal course of events?" is at best half an answer, useless without also evaluating the answer to the complementary question "(b) How likely is it that this would occur under an abnormal course of events."

Of *course* the probability of (a) is extremely small. So what? Almost any event is extremely unlikely if defined down to the details. The probability of (b) is also extremely small - but in my opinion far more likely than the probability of (a).

But even if you could calculate (a) and (b), which you can't do except to a simplistic and amateurish extent - you'd not have a good case unless you could further evaluate prior probabilities - which is also imponderable.

Both questions (a) and (b) are unanswerable except by stipulating a model: What exactly is meant by "normal course of events"? Does it mean voting is a random draw from a population of ... what? Defined how? What "abnormal" is to be considered? Is it retrofitted to be tailored to what makes the argument best - so if a collection of 100 paranoid conspiracy theorists imagine 100 different ways to manufacture ballots, do sleight of hand, con voters, invade ballot equipment factories in the dead of night - only the 1 hypothetical method that maximizes the observed result is considered?

Then there is the problem of prior probabilities. To do a proper job, it would be necessary to evaluate, for each of the 100 (or really infinitely many) schemes for cheating, the probability from a pre-election point of view, that it would take place. What is the probability that a brilliantly secretive and talented cabal would decide to cheat via scheme 52 not withstanding polls saying Biden will win without cheating, the danger of being imprisoned if caught, and the chance of being caught.

Amateurs!

Expand full comment

Edit: Where I wrote that (b) is more probable than (a), I of course meant to say the opposite.

Expand full comment

Voter Integrity, you never even go past my first question. How do you deem your dataset as actually representative of the election returns over the course of the vote count ? And if you do deem it representative, why did you remove about 10% of the vote updates. Your dataset includes 9,609 vote updates - you only analyze 8,954 of them.

https://davidmuncier.substack.com/p/mystery-of-voting-integritys-missing

Until you get past that one, you have zero credibility.... As I have shown, there are numerous anomalous vote updates buried in your dataset that you apparently ignore, only to go ofter anomalies of your choosing. That's not how data science works. Are you incompetent or did you choose to bias your results ? Or both ?

https://davidmuncier.substack.com/p/what-voting-integrity-must-have-seen

Expand full comment

So is your argument that "there is no fraud because I found even more fraud that you didn't find"?

Expand full comment

No, my argument is that Voting Anomaly's assumptions about their derived dataset are wrong, leading to very invalid conclusions based on interpreting the data the wrong way. If Voting Anomaly’s data was as they described, you wouldn’t see the larger and uglier anomalies I pointed out. There’s a big difference between an accurate time series history of vote updates, and the aggregation of a bunch of snapshots of county tally boards over time. I can explain more if you are interested, but the bottom line is that anybody who treats this data as if it is a bunch of sequential updates will be wrong with their conclusions.

Expand full comment

Vote Integrity (not Anomaly) is not treating the data as if it is a bunch of sequential updates. The time stamp of the update was not mentioned in their study because it does not affect the vote margin of an update or the vote ratio of an update. They looked at all of the updates where each candidate had a net gain in votes and showed that

"At the very least, it is possible to definitively say that Joe Biden’s victory in all three of these states (MI, WI, and GA) relied on four of the seven most co-extreme vote updates in the entire data set of 8,954 vote updates."

Are you saying that their data set was "derived" because they only looked at all of the updates where each candidate had a net gain in votes? It would not be possible to do a study involving vote ratios in updates where one candidate got zero votes due to division by zero.

Also, please tell me what "assumptions" about their data you are referring to.

I don't know why you think that their data was not as they described. The vote update data can be found on the internet and you can redo their study with that data if you want to.

As for the "larger and uglier anomalies" that you pointed out in your paper, some of them deal with negative vote updates which I agree are a problem but cannot be included in a study of the vote ratios because the logarithm of a negative ratio number would be undefined. And I disagree with your premise that vote updates that contribute a large percentage to the total winning margin are necessarily anomalies. If a state updates its vote totals infrequently for any reason then each update will naturally contribute a large portion of the winning margin. If a state only did two vote updates then at least one of the updates would have to contribute at least 50% of the final margin. Also consider the case of a voting race swinging back and forth in a state and the final winning margin ends up being small. In this case updates could conceivably have margins that were greater than 100% of the final winning margin.

If we take Vote Integrity's conclusion to be "At the very least, it is possible to definitively say that Joe Biden’s victory in all three of these states (MI, WI, and GA) relied on four of the seven most co-extreme vote updates in the entire data set of 8,954 vote updates" then how is that conclusion invalid based on the data that they analyzed?

Expand full comment

I'm going to start by saying what I said here:

https://davidmuncier.substack.com/p/mystery-of-voting-integritys-missing

You are already wrong ! Voter Integrity's dataset includes 9,609 vote updates or observations, while their charts only contain 8,954 data points. This represents a selective thinning of 655 updates, or selective removal of about 10% of the original data.

The other thing you need to realize is that dataset in not a cohesive set of updates for each county, but rather snapshots of the state tallys based on data coming via Edison Research. Just to be clear, when I say tally, Voting Integrity's dataset only contains 10 pieces of data, "state", "race_id" (always 2020 Presidential) , "timestamp", "vote_share_dem", "vote_share_rep", "vote_share_other", "votes", "expected_votes", "eevp" (% of vote in) , "eevp_source" (Edison). You can read more about the provenance of that data here, but suffice it to say that is NOT a direct view of the data as it comes in...

https://medium.com/@frankaldinger/debunked-why-voter-fraud-detection-using-news-voting-data-is-a-false-narrative-fe212f15d065

And even after Voter Integrity has this dataset, one is only able to "create" updates by subtracting the previous tally from the current tally. This is important since one is deriving the "updates" so every update relies on the accuracy of the previous and current tally - that makes it hard to treat each update as independent when analyzing. That's why Voting Integrity's removal of 10% of the data under the guise of "data cleaning" is just plain suspicious and wrong.

One final note. I have shown in my analysis of all the datapoints that there are far bigger anomalies in states outside of the battleground states that Voting Integrity just threw away. That tells me the they were indeed filtering to show just anomalies in the battleground states.

Expand full comment

It was my understanding that the 655 updates (which is 6.8% of 9,609) that Vote Integrity did not include in their report were those where one candidate had a zero vote gain or a vote loss. These could not be included because they would produce mathematical errors such as division by zero or logarithm of a negative number. Are you alleging that in addition to the updates that could not be included for mathematical reasons that Vote Integrity also threw out other updates because they did not like those updates?

I don't agree with you that you have shown in your analysis that there are far bigger anomalies in states outside of the battleground states because I don't agree with how you are defining what an anomaly is. I read your analysis and I will repeat what I said about it before -

I disagree with your premise that vote updates that contribute a large percentage to the total winning margin are necessarily anomalies. If a state updates its vote totals infrequently for any reason then each update will naturally contribute a large portion of the winning margin. If a state only did two vote updates then at least one of the updates would have to contribute at least 50% of the final margin. Also consider the case of a voting race swinging back and forth in a state and the final winning margin ends up being small. In this case updates could conceivably have margins that were greater than 100% of the final winning margin.

As for the Edison data, if it is not based on actual vote counts that go towards determining the election results then Vote Integrity, you, and I are wasting our time talking about it because it doesn't matter anyway.

Expand full comment

First off, I going to say that you must disagree with Voting Integrity's analysis if you say that "I disagree with your premise that vote updates that contribute a large percentage to the total winning margin are necessarily anomalies." Their whole premise is searching for extremes in vote update size vs. margin differential.

But going beyond that, my main point is that there is no way to selectively "clean" the data as Voter Integrity claims to have, because it is all linked... Once again, the Voter Integrity's dataset is not in the form of updates, but rather snapshots of the aggregate vote totals with Edison estimates of everything else - "vote_share_dem", "vote_share_rep", "vote_share_other". There are two derivations needed to covert to an usable update.

1) multiply the total votes in by all three of the vote share estimates to get votes for each candidate.

2) Subtract the previous tally values from the current values to get the delta updates.

I'll mention this one more time for emphasis - because we are forming the updates by this derivation, it is impossible to discard some derived updates that have problems because they are all linked - the data for a "good update" stems from there results of a previous "bad update".

In the process of doing this one gets negative updates, updates where the total votes go down, shift between candidates, and where the change in votes doesn't match the combined change going to each candidate. I originally thought these changes were caused by error corrections, provisional ballots, etc. But after reading about how Edison Research does their analysis, it's clear that the numbers in their dataset start out as estimates and slowly converge on the real results so many of these malformed updates are more a function of the estimates being revised.

Given the info on how the Edison estimates work, we and many others , are all wasting our time, trying to find and debate anomalies that are mainly artifacts of the way the data is concurrently collected and estimated. But at least now understand why the data has so many consistency problems.

Expand full comment

Actually, you've completely ignored the even more fundamental debunking of your analysis - that these are not in fact real-time feeds from the tabulation machines, but that they are ad hoc updates from different counties that don't all post their updates in a uniform fashion or with uniform frequency.

Your analysis treated the data as if it was in fact uniform, that there was some underlying consistent methodology to the posting of updates, and that is simply not the case. Edison Research themselves have even said that sometimes counties will put all the votes for one candidate into one update.

Because these are unofficial tabulations. They're not the same as the official tabulations that are used to certify the results. So the grouping of the batches that are reported to Edison aren't the same as the batches of ballots that go through the machine.

Expand full comment

What are you talking about?

They analyze the different batch updates irrespective to their uniformity or frequency. They categorize the batches based on how large they are and how skewed they are to one candidate.

Most batches follow a correlation except for the 4 suspicious ones that author points out.

Expand full comment

Unusual events are worth looking into. However, the use of quantitative analysis to buttress the suspicions are not valid. You can't meaningfully talk about probabilities of events that have already happened - the probability of something happening, given that it has occurred, is 1.000 exactly. So, follow up on your suspicions by finding out what actually happened. It should be difficult to hide, and easy to bring to light, some nefarious scheme that would necessarily have involved hundreds of malevolent people coordinating their activities in many different places.

Expand full comment

I'm sympathetic to the essence of this criticism, but the meme that "past events have probability 1" is a bete noir of mine. Where does it come from I wonder? I understand the temptation to fall into it, but it is a fallacy.

"The probability that something has happened, given that it has happened, is 1" is a true statement the way mathematicians use language. However, the condition "given that it has happened" must be taken literally. Just because it is a fact that something has happened doesn't mean that fact is "given." The fact may be, and very often is, unknown. If a jury has to decide whether an accused should be convicted, it is not "given" to them that the accused did the crime whenever the accused did. Rather, the jury must assess the *probability* that the accused is guilty based on evidence presented (known facts). Unknown facts have no role.

Probabilities about future events are typically neither 0 nor 1, but are in between. You might think that's because the future event is not (yet) a fact, or you might notice that a future event is an event about which we have incomplete knowledge. The correct lesson is the second view - probabilities depend on knowledge (sometimes, in the case of a "conditional probability", of stipulated knowledge which is presented, in math-talk, with a "given that ..." phrase). An event, whether past, present, or hypothetical and not moored in time in any way, has a probability that summarizes data/knowledge/information, not facts which may be unknown. Take the "fact" view and you'll have a very useless definition of probability as you will have precluded the concept of conditional probability and you will need to resort to a complicated contortion of language to discuss the concept about a past possible event that the rest of us describe simply with our word "probability."

Expand full comment

Thanks for takaing the time to respond in detail to my comment. I don't fully understand the response, however. Let me clarify the comment. In the first class of the semester in my statistics course for science majors, I give the students a page of numerical data and tell them to find an unusual pattern and calculate the probability of that pattern. Someone finds four consecutive 6's and correctly gives the probability of that happening as 1 in 10,000, so it seems like a significant pattern. Another student sees a dozen odd numbers in a row and says the chances of that are about 1 in 4,000. I congratulate them on the correct calculations, and then I let them know that the page of 'data' is actually a table of random digits. They understand the issue, even before we discuss the different theories of probability: scanning data looking for patterns means in effect testing and rejecting thousands of possible possible patterns, so it isn't surprising to find some that appear unlikely. The probability calculation is meaningless in that situation.

In the real world, the same thing happened in the past with approval of drugs to fight particular diseases. Before current practices were in place, a drug company found a new drug that was significantly better than an existing drug when outcomes were compared 18 days after the new and existing drugs were given to two groups of patients with the disease. However, what they had actually done was to measure outcomes not only on the 18th day, but on every day from day 1 through day 30. They found statistical significance only on day 18 and reported just that result. It's not surprising to find a "statistically significant" difference on one out of many days, even if the actual distribuions of outcomes for the two drugs are identical. Now drug companies must register their trials in advance, specifying exactly which outcomes they will compare.

My point is that there is no legitimate meaning to a probability calculation done without having a definite hypothesis in mind before looking at data. It's fine to start out by looking for patterns, but if you find something unusual you can't just go through the motions of calculating a probability and claim to have shown anything. In science, that initial search for patterns is a "pilot study," and it has to be followed up by a completely independent collection of data with a definite hypothesis in mind that will be tested statistically, this time legitimately. I realize it's not possible with elections to run it a 2nd time, but that doesn't mean it is then OK to ignore the requirement for a followup and base everything on the inappropriate probability calculation . You've found something apparently unusual; now put it to the test by investigating what happened - e.g. find a reputable person who was part of the grand scheme that would have to have been perpetrated and who is willing to tell how they did it - if in fact there was any such scheme.

Expand full comment

Ok, I understand your train of thought. You have experience with students, for example, trying to follow your instruction to "calculate the probability of that pattern" and coming up with a pointless number. You sense an analogy between the student situation and somebody's fishy voting-pattern analysis, and your diagnosis of where both analyses went wrong is that they are futile computations. I can agree. I can sympathize with the hope to find a pithy sentence explaining what's futile and pointless about them. Where you lose my agreement is in choosing "past events always have probability 1" as the pithy statement. Besides being absurd, it's from thin air. There's no logical or even tempting word-play connection leading to it from the situations it allegedly explains. Note your own words - you congratulated your students on their correct calculations. They did not come up with the probability of 1 which you pithily have claimed is the correct number. Yet you congratulated them.

Your less pithy statement "there is no legitimate meaning to a probability calculation done without having a definite hypothesis in mind before looking at data," though a vast over-generalization, comes closer to a reasonable point. One setting for the situation at hand is that someone notices a pattern that seems remarkable and asks you "What is the chance!?" Typically what the questioner really means is "What is the chance of a pattern so striking that it makes me ask you 'what is the chance?'" - and that sure is a question that eludes precise interpretation or answer (echo of "no legitimate meaning"), though we typically can say that the total chance of all surprising patterns is a lot more that the chance of just this particular one.

In summary, in mathematics we need to reason and speak precisely. It is not mathematics to make some claim that is literally false then plead that "I was getting at such-and-such which seems reminiscent so I blurted."

Expand full comment

Thanks for your attention to this - I appreciate the correction and explanation. You are right, the original statement, "the probability of something happening, given that it has occurred, is 1.000 exactly" is not literally or generally true. How about this as a still pithy but correct version: "If one sees that something has happened but the situation does not satisfy the requirements for a valid calculation of probabilities, all one can legitimately say is that it happened."

- Steve George

Expand full comment

You're conflating two things here.

The probability that Bob won (or lost) the lottery that he *just* won (or lost) is 1:1.

The probability that Bob would win the Mega Millions lottery is 1:302,575,350.

That he won (or lost) doesn't change the probability that he *might* win, before, during, or after.

Expand full comment

Thanks for your comment. It's basically what I was trying to say but I didn't explain it fully. The fact that Bob's win was a priori improbable does not in itself imply that something is amiss, e.g. that he forged his ticket or tampered with the lottery. Any individual's win is improbable, but millions of tickets were sold so the probability that someone will win is high. In the same way, some individual sequences of ballot counting results may look improbable scanning over the outcome data, but there is a huge number of possible legitimate sequences. If AFTER the fact we see an outcome that would have seemed improbable BEFORE the fact, go ahead and investigate it to see if something underhanded took place, but it's not valid to use a probability calculation by itself to make the point, unless there was a specific hypothesis before the election about that particular outcome.

Expand full comment

You don't have to make the hypothesis before the event. There are entire fields of study, from Big Data to forensics that operate on data after events (sometimes by years).

You should do it, if possible, to ensure that the right KPIs are being measured, but it is not a strict requirement to conduct an analysis of data.

Expand full comment

All I can do is to repeat a simple fact about data analysis: a probability calculation that would be valid when made before examining data cannot legitimately be applied to a pattern found only after looking through data. Doing so would be a classic case of the "multiple comparisons" fallacy. If one does attempt to analyze already-collected data, one needs to use a modified probability calculation, which was not done in the election analyses I was criticizing.

Expand full comment

Your application of multiple comparisons doesn't seem to align with my understanding of the topic, but I will admit that I graduated decades ago.

Expand full comment

Hi, Vote Integrity. We analyzed your research (among many others). Think the election was rigged? We’ll challenge anyone who believes Trump was the true winner to a debate judged by impartial experts that we’ll choose together. The stakes: $100,000.

https://rootclaim.com/analysis/was-there-widespread-fraud-in-the-2020-us-election

Expand full comment

Nice work! You should share a project at a repository! Most of us probably realize it doesn't take a doctorate and fancy credentials to do data analysis, but does take some coding skills and access to the data! Thanks for this article!

Expand full comment

It's the data that's important, not the author. Take what is presented and go out to confirm it or disprove it. The information presented here has a logical pattern that can be investigated, challenged and tested. Laziness to discard it over not knowing the author...the author is not the most burning question.

Expand full comment

OP, are you still around? Any updates?

Finally found this after switching to DuckDuckGo... it’s a shame Google hid these posts.

Expand full comment

Voter Integrity, again I ask you sir, are you aware of the work of Edward Solomon? https://www.youtube.com/channel/UCIxc8YMkny2KBaD5TQsSbpg

You should get in contact with him. edwardkingsolomon@gmail.com

Expand full comment

I agree with Dennis that the Gofile.io mentioned in para. 2 is missing. Do any of you know how I could get it? Thanks.

Expand full comment

I believe your probability calculation for footnote [7] is twice what it should be. It should be .0837%. The difference between your .167% and my .0837% is your use of the binomial (2 1), where it should be (2 2) (ie, 2 choose 2). My calculation can be supported by how you calculated footnote [8] and [9].

Expand full comment

In the second paragraph, you provide a link to an upload at Gofile.io but it's not there, it shows the message: "This upload does not exist." So either the link is wrong/contains a typo, or Gofile is part of the Woke Tech crowd.

Expand full comment

In framing the statistical question, how did you decide to focus on the probability of the updates occurring in those 3 states (versus those 3 plus PA, or plus AZ, etc.)? The more battleground states you add in, the less impressive the probabilities become.

Expand full comment

There is a thing called common sense. When people let data and science clog the obvious that common sense provides. There is a serious problem. I watched the Nov 3 elections. Just like I have every election day/night since I was 10. You can sit there all day long a spout out science and data and talk the intelligent talk. Doesn't matter one bit because when your eyes see it happening, and then 3 other people in your household in seperate rooms see it happen, then you all get pen and paper , return to your seperate rooms so no one is swaying anyone else and as the totals start to roll again your writing the numbers down.

Unless your scientific words can tell me that counting votes results in a candidates votes decreasing, then you have no argument.

But even more so, unless your definition can also explain how the number of votes that decreased on one candidate was the exact number of votes the other candidates votes increased in every situation , you have no argument.

Unless you can explain to me with all the laws of the Universe that for the first time in election history that not only did states stop counting but 4-5 states, with different populations, different geographical characteristics, with different numbers of how many postal offices they have, etc. all happen to stop at the same exact time but not only that for the same exact reason. So are we to believe that 5 states stopped counting not only at the same exact second but that they all ran out of ballots at the same exact time? You would have to be a complete and utter moron to believe that in any capacity. But also I would like to know when it started that a state takes the day off

Expand full comment