Statistical anomalies in Russian elections
In this paper, we propose a novel approach, called SENATUS, for joint tr...
Our world is filled with both beautiful and brainy people, but how often...
The Honduran incumbent president and his administration recently declare...
Detecting out-of-distribution examples is important for safety-critical
In this paper we investigate a robust method to identify anomalies in co...
A recent debate among election experts has considered whether electronic...
The increasing popularity of server usage has brought a plenty of anomal...
Statistical anomalies in Russian elections
In Russia, data from each polling station are freely available online after each election and include the number of registered voters, the number of people who participated in the election, and the number of ballots cast for each candidate. We can apply statistical analysis to these data to see if there are irregularities, which may serve as evidence of falsifications.
Arguably the two most important numbers that describe an election outcome are turnout percentage and leader’s result percentage (with leader’s result referring to Putin during presidential elections and the ruling United Russia party during parliamentary elections). These percentages are not reported in the data sets from individual polling stations but can be calculated from the information provided officially.
We (and others) have previously argued that due to human attraction to round numbers, large-scale attempts to manipulate reported turnout or leader’s results would likely show up as frequent whole (integer) percentages in the election data [KSP16a, Roz17]. In a previous Significance article, we gave the hypothetical example of a polling station with 1755 registered voters [KSP16b]. Here election officials decide to forge the results and report a turnout of 85%. They choose 85% because it is a round number which is more appealing than, say, 83.27%. As we explained: “To achieve a falsified turnout of 85%, this polling station needs to report ballots cast… Note that the number 1492 is not remarkable in itself; it is only the resulting percentage value (i.e. the 1492/1755 ratio) that is round.” Other polling stations making similar attempts at fraud may also choose 85% as their target value, so that when we look at the turnout percentages for all polling stations, we see a noticeable spike in the number of stations with turnout of 85%.
In our previous article, we found these integer peaks for elections from 2004 to 2012. Since then, two new elections have been held in Russia: the 2016 parliamentary elections and the 2018 presidential election. These give us the opportunity to again test our hypothesis (see github.com/dkobak/elections for data and code).
Figure 1 shows histograms for turnout and leader’s result at polling stations in the two most recent elections. As with previous elections, sharp periodic peaks are clearly visible at integer values (such as 91%, 92% and 93%) and at round integer values (such as 80%, 85% and 90%), rather than fractional values (such as 91.3%).
Having again identified peaks at integer values, we wondered whether there had been any change in their prevalence over time, particularly given the scrutiny applied to past elections by the media and academics? To address this question, we compared the number of polling stations with integer percentages for turnout or leader’s result against the number that would be expected by chance. The expected values were computed by Monte Carlo simulations of election results using the binomial distribution of ballots at every polling station.
shows the excess of polling stations with integer values for all elections between 2000 and 2018. Before 2004, the number of polling stations with integer values for either turnout or leader’s result (the blue curve) was close to that expected by chance, indicated by the null hypothesis value of 0 excess on the-axis. From 2004, the excess of integer polling stations increased, spiking in 2008 and dropping back in 2011. Since then, it has steadily increased to levels last seen in 2008.
The excess of polling stations with integer values for turnout or leader’s result at both the 2016 and 2018 elections is far beyond what we would expect to find as the result of random chance. The grey shading in the figure shows intervals until the 99.9 percentile of values obtained in our Monte Carlo simulations; any individual value above this shaded area has a probability of less than one in a thousand () under the statistical model used.
It is also instructive to look separately at the excess of polling stations with integer values for turnout (green curve) and leader’s result (red curve). For the latter, the excess has stayed constant since 2011. However, the former has kept growing and reached its historic maximum in 2018. If these results are taken as evidence of electoral data manipulation, this might suggest that recent efforts to manipulate results have focused on turnout, presumably because the leader’s result was believed to be high enough already.
As we have shown previously [KSP16a], integer peaks in the election data do not originate uniformly across all parts of the country; they are mostly localised in the same administrative regions, providing additional evidence in support of our hypothesis that these are not natural phenomena. Specific integer peaks can sometimes be traced to a particular city, or even an electoral constituency within a city, where turnout and/or leader’s result are nearly identical at a large number of polling stations [KSP16b].
The most prominent example from the last two elections was the city of Saratov in 2016. Its polling stations are the sole contributor to the sharp turnout peak at 64.3% and the leader’s result peak at 62.2%, both visible in Figure 1. These peaks are not integer and so are not counted towards the anomalies computed for Figure 2. Curiously, their product — showing the fraction of leader’s votes with respect to the total number of registered voters — is , which is exactly round.
We thank Brian Tarran (Significance magazine) for his valuable help in editing this article.