Guest Post by Carl Bell – Originally on Revolver News

Carl Bell holds a Ph.D. and teaches in a field relevant to statistical analysis at a North American university.

Identifying Voter Fraud through Suspicious Birthday Distributions in Pennsylvania Voter Registration Data, and the Effect on the 2020 Pennsylvania Presidential Election

Short Summary:

We construct a new metric of potential voter fraud using suspicious distributions of birthdays in Pennsylvania voter registration data. The basic idea is that people picking fake birthdays will make predictable non-random choices, like picking round numbers for days of the month, and not knowing what true birth month distributions look like. 

Under this metric, a number of counties in Pennsylvania have extremely unlikely distributions of voter birthdays. Seven counties representing almost 1.4 million votes total (Northumberland, Delaware, Montgomery, Lawrence, Dauphin, LeHigh, and Luzerne) have suspicious birthdays above the 99.5th percentile of plausible distributions, even when using conservative assumptions about what these distributions should look like.

These suspicious birthdays also matter significantly for election outcomes. While there are suspicious counties that vote Republican overall, in general more suspicious birthdays in a county are strongly associated with a larger Biden vote share, and a higher Biden vote share relative to all Democrat presidential candidates since 2000. More suspicious birthdays are also associated with a higher vote share for Jorgensen relative to Trump (consistent with a fraud scheme aiming to get Biden high but not “too high”, while simultaneously giving as few votes to Trump as possible). 

Finally, we quantify the magnitude of how this potential fraud may have impacted the election. Even a small reduction in the amount of suspicious birthdays (to the 98th percentile of the conservative distribution) would be predicted to have resulted in Trump winning the state by 71,500 votes. This suggests that whatever is driving the anomalous patterns in birthdays is sufficiently important to affect the statewide election result.

Executive Summary:

We use a largely ignored data source to identify suspicious voter registrations by county, a data source that is independent of the actual vote outcomes. In other words, we will construct metrics that identify counties that show indications of potential voter fraud regardless of who a county is voting for. Then, once this is done, we will show how these measures correlate with vote outcomes.

Our key insight is that someone making up fake birthdays for voter registrations is unlikely to be able to do so in a truly random manner. Instead, we identify several likely hallmarks of fake birthdays:

-They are likely to excessively cluster on round number days of the month (1, 10, 15, 20, 30, 31), since people generally overweight round numbers. 

-They are likely to excessively cluster on January and December for the same reason.

-They are likely to excessively cluster on months of the year which in general have few birthdays in overall demographic data (i.e. fake birthdays will be drawn roughly evenly across months, subject to the round number effect above, while true birthdays tend to cluster more in certain months like July and August, and less in months like February and November).

We call these “suspicious birthdays” — individually any one person can easily have any of the traits above, but having too many overall in a county suggests that fake birthdays have been added to the pool. We take these three measures of suspicious birthdays, and evaluate them against a combination of two types of benchmarks of what might be expected in the absence of fraud. These are designed to ensure that any unusual patterns are not coming from other reasons (e.g. births generally avoiding holidays, or people generally having sex more at certain times of the year):

-A “best guess” benchmark, where we compare each county to overall demographic data:
Historical Day-of-the-month in birthdays from the Social Security Master Death File, and
Historical Month-of-the-year in birthdays for that county from the National Center for Health Statistics

-A “conservative” benchmark, where we also add in measures of each county relative to the distribution of all voter birthdays in the state of Pennsylvania. This has the effect of measuring how unusual each county looks just compared to other counties, and so effectively strips out the average level of fraud across all counties. 

We find that even under the conservative distribution, seven counties representing almost 1.4 million votes total (Northumberland, Delaware, Montgomery, Lawrence, Dauphin, LeHigh, and Luzerne) have numbers of suspicious birthdays above the 99.5th percentile of plausible distributions. This represents the average abnormal metric across six different ways of measuring suspicious birthdays. In other words, these counties are not abnormal just along one or two measures, but across the whole range of them. The three worst offenders, Northumberland, Delaware and Montgomery, are above the 99.97th percentile, the 99.91th percentile, and 99.74th percentile respectively, a result extremely unlikely to occur by chance. Montgomery also has significant evidence of voter fraud across entirely separate measures. Meanwhile, 15 counties score above the 95th percentile of abnormal birthdays on average, and these represent almost 3.5 million votes (the additional eight are Berks, Northampton, Cumberland, Bucks, Philadelphia, Monroe, Lancaster and Erie). Recall, these measures are under the conservative benchmark – under the best guess benchmark, the deviations look even more extreme. 

 

Next, having identified measures that indicate potential voter fraud, we show that they also make a large difference to county election outcomes. A greater level of suspicious birthdays in a county is significantly related to higher vote share for Biden. A one standard deviation increase in suspicious birthdays is associated with a 6.8 percentage point increase in the two-party vote share for Biden. The probability of observing such a relationship by chance (i.e. the p-value) is less than 0.000008. 

It is worth noting that abnormal counties are not exclusively majority Democrat – indeed, the most extreme county in suspicious birthdays, Northumberland, voted almost 70% for Trump. Lawrence, Luzerne and Berks, also majority Republican, look suspicious as well. This is consistent with a genuine measure of fraud – it would be highly surprising if fraud were a uniquely Democrat phenomenon. 

Nonetheless, higher likelihood of fraud is strongly associated with more votes for Biden. Out of the 13 counties in Pennsylvania who voted majority Biden, 9 are above the 95th percentile of suspicious birthdays. 

Second, we show that more suspicious birthdays are also associated with a county having a higher presidential Democrat vote share relative to all previous elections in recent history. The p-value for this relation is less than 0.003, and a one standard deviation in abnormal birthdays increases Biden’s vote share relative to all recent past elections by 2.4 percentage points. There are 5 counties out of 67 where Biden’s two-party vote share exceeded the performance of the Democratic candidate in all presidential elections since 2000 – Montgomery, Delaware, Cumberland, Allegheny, and Chester. Of these, three are above the 98th percentile of the suspicious birthday distribution.

Third, more suspicious birthdays are also associated with higher vote shares for Jorgensen relative to Trump. This is an additional likely consequence of fraud – if a perpetrator wants to maximize the overall contribution to Biden over Trump in the statewide race, and doesn’t want to report an implausibly high overall vote for Biden, the only alternative is to add votes to Jorgensen.

Finally, we can use these results to estimate the likely effect of suspicious birthdays on the overall Pennsylvania election outcome. Because someone making up birthdays will not always select, for example, round days in January or December, the actual numbers of excess suspicious birthdays are likely to considerably understate the magnitude of possible fraud. As a result, we use the relationship between excess birthdays and Biden vote share to estimate the effect of a change in the magnitude of suspicious births on county vote outcomes.

In particular, we consider what would happen if the ten counties who scored above the 98th percentile of suspicious birthdays under the conservative distribution were instead to merely be exactly at the 98th percentile. This would still leave these counties looking very suspicious, but merely less so. Even this minor change would result in an additional predicted 76,600 votes for Trump and the same number fewer for Biden, which would be enough to swing the state to a Trump win overall by 71,500 (as Biden has a vote lead of 81,660 votes).

These results suggest strongly the presence of abnormal birthday distributions consistent with a large number of fraudulent voter registrations. They also provide strong evidence that the presence of such abnormal birthdays is positively associated with more votes for Biden, including at historically anomalous levels. Finally, the magnitude of these suspicious birthdays is plausibly large enough to affect the entire statewide outcome of the Pennsylvania presidential election vote.

For a Detailed Description of Analysis – Visit Revolver News