Since Electronic Voting Machines (EVMs) were introduced in India in the 1999 elections, there have been questions about their transparency and trustworthiness; a number of security vulnerabilities have been documented ([wolchok2010security]). In 2013, the Indian Supreme Court ruled that all EVMs in Indian General Elections must be equipped with printers providing Voter-Verifiable Paper Audit Trails (VVPATs, [IndiaSCVVPATs]). The Election Commission of India has introduced VVPAT-equipped EVMs in several constituencies and has promised that all EVMs used in the upcoming 2019 General Election will have VVPAT printers.
VVPATs allow each voter to verify that his or her intended selections are correctly printed on a paper record, which is collected in a separate container called the VVPAT box. VVPATs provide a way to check and correct election results, for instance, if there is a legal demand by a candidate, or for routine checks of election tabulation accuracy—audits. VVPATs could be manually recounted to check the electronic results, but that is labor-intensive and time-consuming. We show how auditing a random sample of VVPAT records can justify confidence in election results without a full manual tally. Auditing a VVPAT means manually inspecting the paper record to see the voter preferences it shows. Different auditing strategies—for instance, ballot-level comparison audits and ballot-polling audits—use that information differently, as described below.
The Election Commission (EC) of India is taking steps to increase the transparency of the Indian electoral system. In a recent report,111http://indianexpress.com/article/india/ec-to-tally-paper-trail-slips-with-evms-in-5-pc-booths-in-each-assembly-seat-4737936/ the EC decided to tally the paper trail slips and compare them with the electronic result provided by the EVMs in of the booths in each Assembly seat district, selected randomly. This effort, while well-intentioned, does not suffice to give strong evidence that election results are correct. In this paper, we show rigorous ways of attaining well-defined confidence levels.
Suitable post-election audits may justify confidence of voters, candidates, and parties that election results are correct. One type of post-election audit is a Risk-Limiting Audit (RLA), which either develops strong statistical evidence that the reported outcome is correct, or corrects the results (by conducting a full manual tally of a reliable paper trail). Here, “outcome” means the set of reported winners of the contests, not the exact vote tallies. To ensure that the tallies are correct to the last vote is prohibitively expensive, if not impossible; conversely, to ensure that the reported winners really won seems like the lowest reasonable standard for accuracy.
Before a RLA commences, the risk limit (denoted
) must be chosen; ideally, it is set in legislation or regulation, so that auditors cannot manipulate the level of scrutiny a contest gets by adjusting the risk limit. The risk limit is the maximum probability that the audit will fail to correct the reported election outcome, on the assumption that the reported outcome is wrong. The risk limit is a worst-case probability that makes no assumption aboutwhy the outcome is wrong, e.g., it could be because of accidental error, procedural lapse, bugs, misconfiguration, or malicious hacking by a strategic adversary who knows how the audit will be conducted. RLAs assume that the paper ballots reflect the correct outcome, i.e. that a full manual tally of the paper trail would show who really won. A RLA of an unreliable paper trail is “security theater.” Hence, there need to be procedures (called compliance audits by [benaloh2011soba, LindemanStark2012, stark2012evidence, stark2018]) to ensure that the paper trail is complete and intact before the RLA begins.
This paper shows how two types of RLAs can be used with Indian elections: transitive ballot-level comparison audits and ballot-polling audits. Ballot-level comparison audits require a “commitment” to the interpretation of each ballot in a way that allows that interpretation to be compared to a human reading of voter intent directly from the paper ballot. Ballot-polling audits do not require knowing how individual ballots were interpreted. Ballot-level comparison audits are more efficient in the sense that they generally involve inspecting fewer ballots to attain the same risk limit when the reported outcome is correct. However, they require more setup. As discussed in Section 3, they may require a voting system that can export its interpretation of individual ballots in a way that can be matched to the corresponding paper, or may require sorting the physical ballots or VVPATs before the audit, according to the votes they (reportedly) show.
Our main contribution is to develop RLAs for a new social choice function—Indian parliamentary majorities—with procedures suited to the logistics of Indian elections. To verify the overall election outcome we need to verify that the party/coalition reported to have been elected to form the government actually won. That generally requires less auditing than confirming the winner in every constituency. The method we develop splits the responsibility of the auditing among various constituencies in a way that the combined result gives higher confidence in the correctness of the overall parliamentary outcome than each constituency would have in its results alone. This procedure is discussed in Section 4. Our methods apply to any parliamentary democracy, but the computations are particularly simple when all constituencies have equal weight.
2.1 Indian Elections
Indian General elections are held Quinquennially to elect the Lok Sabha (Lower House of the Parliament). The country is divided into 543 constituencies, each represented by one person elected to the Lok Sabha. Elections at the constituency level are plurality contests: the person who gets the most votes wins. Candidates at the constituency level typically belong to some political party, but can be unaffiliated with any party. At the parliamentary level, the party that gets the majority of the seats forms the government. If no party has a majority, parties may form coalitions to attain a majority. Coalitions can be formed before or after elections, although before is more common. Elections are conducted in phases spread over a month. Each phase consists of single-day elections in a subset of constituencies, typically grouped by geography.
2.2 Related work on election auditing
RLAs are procedures that guarantee a minimum chance of conducting a full manual tally of the voter-verifiable records when the result of that tally would belie the reported outcome. They amount to a statistical test of the null hypothesis that the election outcome is wrong, at significance level , the risk limit
, chosen in advance. A RLA continues to examine more ballots until the null hypothesis is rejected at significance level, allowing one to conclude with known confidence that the outcome is in fact correct—or until there has been a complete manual tally to set the record straight. The risk limit is the largest chance that the audit will not require a full manual tally of the paper records if the electoral outcome according that tally would differ from the reported electoral outcome.
RLAs were introduced by [Stark2008a], but were not so named until [Stark2009b]. The first RLAs were conducted in California in 2008 ([HallEtal2009]). RLAs have been conducted in California, Colorado, Indiana, Michigan, New Jersey, Ohio, Virginia, and Denmark. RLAs have been developed for a variety of social choice functions and for variety of sampling strategies (unstratified sampling of individual ballots or batches, with or without replacement; stratified sampling with and without replacement, Bernoulli sampling) and auditing strategies (batch-level comparisons, ballot-level comparisons, and ballot polling).
Ballot-polling audits ([Lindeman2012, LindemanStark2012, bernhardEtal2019]) do not require knowing how the system interpreted individual ballots nor how it tallied the votes on subsets of ballots. They directly check whether the reported winner(s) received more votes than the reported loser(s) by sampling and manually interpreting individual ballots. To draw a random sample of ballots typically involves a ballot manifest, which describes how the physical ballots are organized: the number of bundles, the labels of the bundles, and the number of ballots in each bundle. (However, see [bernhardEtal2019].)
The BRAVO ballot-polling method ([Lindeman2012]) uses Wald’s sequential probability ratio test ([Wald1945]) to test a collection of null hypotheses, namely, that any loser in fact tied or beat any winner. For each (winner, loser) pair, the audit tests the hypothesis that the loser got as many or more votes than the winner. The audit stops (short of a full hand count) only when there is sufficiently strong evidence that every winner beat every loser.
Comparison audits involve manually checking the voting system’s interpretation of the votes on physically identifiable subsets of ballots. They require the voting system to export vote tallies for physically identifiable subsets of ballots, so that the votes on those ballots can be tallied by hand and compared to the voting system’s tallies. They also require checking that the reported subtotals yield the reported contest results, and that the subtotals account for all ballots cast in the contest. They generally also require ballot manifests.
A comparison audit that checks the voting system’s interpretation of individual ballots is a ballot-level comparison RLA. Ballot-level comparison RLAs are more efficient than batch-level comparison RLAs and ballot-polling RLAs in that they generally require examining fewer ballots when the reported outcome is correct. However, they have higher set-up costs and require more data export from the voting system: they need a cast vote record or CVR for each physical ballot, a way to locate the CVR for each physical ballot, and vice versa. (A CVR is the voting system’s interpretation of voter intent for a given ballot.) Relying on more general results in [Stark2009b] for batch-level comparison audits, [Stark2010] developed a sequential ballot-level comparison RLA method that results in particularly simple calculations.
Transitive audits ([calandrino2007machine, LindemanStark2012]) involve auditing an unofficial system that is easier to audit than the official system. If the two systems agree who won, an audit that provides strong evidence that the unofficial system found the correct winner(s) transitively provides strong evidence that the official system did also; and if the audit of the unofficial system leads to a full manual tally, the outcome of that tally can be used to correct the official result. A transitive audit does not confirm that the official system tallied votes correctly; indeed, the two systems might disagree about the interpretation of every ballot, but still agree who won.
Indian EVMs do not create CVRs, but—if the VVPATs are organized appropriately—they can still be audited using a transitive ballot-level comparison audit. CVRs can be constructed for EVMs by sorting the VVPATs into bundles that (purportedly) show the same voter preferences, counting the number of VVPATs in each batch, and labeling each bundle with the number of ballots and the voter preferences it purports to contain. A report of the bundle labels, the number of VVPATs in each bundle, and the reported voter preference for the bundle amounts to a CVR for every VVPAT. Such a report in effect combines a ballot manifest ([LindemanStark2012]) and a commitment to a cast vote record for every ballot, implied by the label of the bundle the ballot is in. We shall call such a report a preference manifest.
If ballots are sufficiently simple—e.g., if each contains only one contest, as in India—sorting ballots by voter preference can be practical. Indeed, this is how ballots are tallied manually in Denmark: on election night, ballots are sorted within polling places according to the voter’s party preference. The following day, ballots are sorted further according to the voter’s candidate preference, to produce homogeneous bundles of ballots, each labeled with the number of ballots and the voters’ preference.
Such sorting-based CVRs were the basis of a RLA in Denmark ([Schurmann2016]). The sorting might be manual, as it is in Denmark, but it could be automated partly or entirely. (Sorting may also increase vote anonymity by breaking any link between voter and ballot.) When the official tallying process itself is based on creating and counting the homogeneous bundles, as it is in Denmark, the audit is a direct audit of the voting system. If the sorting is conducted independently of the tabulation, as it would be if India were to sort the paper ballots to produce a preference manifest, the resulting audit is a transitive audit.
The first step of a ballot-level comparison RLA is to verify that the CVRs are consistent with the reported results: that applying the social choice function to the vote subtotals implied by the sizes of the bundles and the votes they purport to contain produces the same set of winners. If the preference manifest does not produce the same set of winners as reported, the audit should not continue: there is a serious problem. The audit should also check that the number of CVRs for each contest does not exceed the number of ballots cast in the contest, which should be determined without reliance on the voting system ([banuelos2012limiting]). If the preference manifest passes these checks, the audit can begin to select ballots at random to check the accuracy of the CVRs implied by the preference manifest against a manual reading of voter intent from each paper ballot.
[Kroll2014] present a method for reducing the workload in auditing multi-level elections, inspired by the US Electoral College. They show that to achieve an overall confidence that a party or coalition secured the majority of seats, the individual constituencies can sometimes be audited to lower confidence levels. They provide a constraint optimization program describing the set of feasible solutions (i.e. those that constitute a sufficient audit) and a number of methods for finding the optimal solution. In India’s electoral system, as in many other parliamentary democracies, every constituency has equal weight.
3 Auditing individual constituencies using extant methods
This section discusses how existing methods for RLAs apply to Indian elections. We consider auditing individual constituencies rather than the entire election; Section 4 shows how to combine audits of constituencies to audit an entire contest.
India’s voting system currently does not support ballot-level comparison audits, but, as described above, if procedures were added to sort the paper ballots and to report a preference manifest, transitive ballot-level comparison audits would be possible. Because ballots in India are simple—a single selection in a single contest—such sorting is feasible.
Ballot-polling audits could be used in India without requiring sorting the ballots or modifying the voting system, if ballot manifests were available (see section 3.2). The calculations for BRAVO ([Lindeman2012]) and the ballot-polling method in [LindemanStark2012] are simple enough to do with a pencil and paper or hand calculator, and have open-source online tools by Stark: https://www.stat.berkeley.edu/~stark/Vote/ballotPollTools.htm
When the election outcome is correct, ballot-level comparison audits generally require inspecting fewer ballots than ballot-polling audits. (Because they are RLAs, when the outcome is incorrect, both methods have a large chance of requiring a full manual tally.) The advantage grows as the margin shrinks: as a rule of thumb, workload increases inversely with the reported margin for ballot-level comparison audits, and increases inversely with the square of the actual margin for ballot-polling audits. However, preparing for a ballot-level comparison audit is harder, because it requires CVRs linked to the corresponding physical ballots. While they may require inspecting more ballots, the simplicity of ballot-polling audits may offset the work of examining more paper, unless the margin is very small.
3.1 Transitive Ballot-Level Comparison RLA
Ballot-level comparison RLAs were introduced by [Stark2010] who provides online tools at https://www.stat.berkeley.edu/~stark/Vote/auditTools.htm; see also [LindemanStark2012]. Ballot-level comparison audits require a way to find the CVR corresponding to each paper ballot, and vice versa. The EVMs currently used in India do not provide CVRs at all.
However, sorting ballots into groups according to the vote (if any) that they are reported to show in effect provides a CVR for each ballot through a preference manifest that lists the bundles of ballots, the number of ballots in each bundle, and the (single) preference that every ballot in the bundle is supposed to show. This approach was used by [Schurmann2016] to audit an election in Denmark.
In Denmark, ballots are manually sorted into bundles with homogeneous voter intent, but sorting could be automated with relatively simple equipment, possibly something similar to the system used in South Korea. Whether it is worth the effort to sort the ballots depends on the margin in the contest: if the margin is wide, it will be less expensive to use ballot polling, but if the margin is very narrow, the cost of sorting—whether manual or automated—may reduce the sample size required to confirm the outcome by orders of magnitude.
3.1.1 Classifying CVR errors
[Stark2009b] reviews a number of methods to test the hypothesis that any loser received more votes than any winner by comparing hand counts of votes in randomly selected batches of ballots to the machine counts of the votes on the same ballots. The methods apply to arbitrarily small batches, including batches consisting of a single ballot; that is, to ballot-level comparison audits. [Stark2010] elaborated on one of those methods, which relies on the Kaplan-Markov inequality. By introducing a taxonomy of discrepancies, the arithmetic can be simplified to the point that a pencil and paper suffice, while rigorously maintaining the risk limit. That “super-simple simultaneous single-ballot” method was further simplified by [LindemanStark2012], and is the basis of pilot audits in Denmark, California, Colorado, Indiana, Michigan, New Jersey, and Virginia, and of the statutory risk-limiting audits in most Colorado counties.
[stark2014verifiable] presented a ballot-level comparison RLA method based on the Kaplan-Wald inequality, which has some advantages over the Kaplan-Markov inequality. In this paper, we use the Kaplan-Markov method as simplified by [LindemanStark2012], because it has been used more widely. We shall refer to it as the LSKM method. It is straightforward to modify the procedures below to use the method of [stark2014verifiable] instead.
The LSKM method is sequential: it involves examining more and more ballots selected at random until either there is strong evidence that the reported winners really won, or until there has been a full hand count and the correct outcome is known.
Conceptually, after examining one or more ballots, one calculates a sequentially valid222Sequentially valid means that the chance that the infimum of the -value over all sample sizes is less than or equal to is less than or equal to if the null hypothesis is true.
In contrast, standard hypothesis tests are designed for sample sizes that are fixed ahead of time: expanding the sample and re-calculating the -value for such tests generally produces type I error rates far larger than the nominal significance level, because it does not account for multiplicity.
-value for such tests generally produces type I error rates far larger than the nominal significance level, because it does not account for multiplicity.-value of the hypothesis that the outcome is wrong. If that -value is less than or equal to the risk limit, the audit stops; otherwise, more ballots are audited and the sequential -value is updated. The method presented below to check the overall electoral outcome involves combining the -values for individual constituencies.
If the audit does lead to a full hand tally in a constituency, the reported results are replaced by the results according to that full hand tabulation. Election officials may elect to terminate the audit and conduct a full hand count at any time, for instance, if they estimate that the cost of additional sampling will exceed the cost of a full manual tally.
The LSKM method involves classifying discrepancies between the CVR and a manual reading of voter intent from the paper ballot:
If correcting the CVR would reduce the margin between any (reported) winner and any (reported) loser by two votes, the discrepancy is a 2-vote overstatement (the number of 2-vote overstatements is denoted ).
If not, but if correcting the CVR would reduce the margin between any winner and any loser by one vote, the discrepancy is a 1-vote overstatement (the number of 1-vote overstatements is denoted ).
If not, but if correcting the CVR would not increase the margin between every winner and every loser, the discrepancy is a neutral error. (Neutral errors do not enter the stopping rule explicitly.)
If not, but if correcting the CVR would increase the margin between every winner and every loser by at least one vote, and increase the margin between some winner and some loser by exactly one vote, the discrepancy is a 1-vote understatement (the number of 1-vote understatements is denoted ).
If correcting the CVR would increase the margin between every winner and every loser by two votes, it is a 2-vote understatement (the number of 2-vote understatements is denoted ).
Two-vote overstatements should be rare if the voting system is working correctly: they involve mistaking a vote for a loser as a vote for a winner. Two-vote understatements should be even rarer—and are typically mathematically impossible. For instance, in a plurality, vote-for-one contest with three or more candidates, two-vote understatements are impossible, because they would require having mistaken a valid vote for the winner as a valid vote for every losing candidate.
We assume that there is a trustworthy upper bound on the total number of ballots cast, for instance, from pollbooks or from information about the number of eligible voters. A preliminary check should ensure that the preference manifest does not list more ballots than that upper bound: if there are more ballots listed than can exist, there is a serious problem that the audit cannot address by itself.333Prof Sandeep Shukla of IIT Kanpur has pointed out that the current Indian VVPAT design does not protect against the EVM adding electronic votes and corresponding VVPATs when the voter is not looking, because there is no publicly observable mechanism to ensure that at most one VVPAT is inserted into the box per voter. This needs to be addressed by improving the physical design in a way that is out of the scope of this paper.
In the sorted-ballot method described above,
a 2-vote overstatement occurs if we find a vote for a reported loser in the reported winner’s pile;
a 1-vote overstatement occurs if we find a vote for a different reported loser in a reported loser’s pile;
neutral errors don’t occur;444Indian EVMs (as far as we know) do not produce blank votes. However, if they did they could be accommodated easily. A 1-vote overstatement occurs if we find a blank vote in the reported winner’s pile. A netural error would occur when there were at least three candidates and we found a blank vote in a reported loser’s pile. A one-vote understatement would occur when there were exactly two candidates and we found a blank vote in the reported loser’s pile.
a 1-vote understatement occurs when there are at least three candidates and we find a vote for the reported winner in a reported loser’s pile;
a 2-vote understatement occurs only when there are exactly two candidates, and we find a vote for the reported winner in the reported loser’s pile;
if a pile turns out to be smaller than reported, the discrepancy can be addressed using the “phantom to zombie” approach of [banuelos2012limiting].
if a pile turns out to be larger than reported, then some other pile must be smaller than reported, and the “phantom to zombie” approach of [banuelos2012limiting] will still ensure that the risk is controlled conservatively.
There are sharper ways to treat discrepancies than to use these categories (in particular, keeping track of which margins are affected by each discrepancy can reduce the number of ballots the audit inspects; see [Stark2010]). However, the bookkeeping is more complex. Categorizing discrepancies this way makes the calculations simple enough to do with a pencil and paper (aside from calculating 5 constants involving logarithms, which can be done once and for all and verified by anyone).
Let denote the current sample size and the risk limit. Fix . The LSKM method stops auditing (and concludes that the reported winners really won) if
In this expression, is the diluted margin, the smallest difference in votes between any winner and any loser, divided by the total number of ballots in the population from which the sample is to be drawn, including ballots with invalid votes. The constant is the error inflation factor, which controls the operating characteristics of the LSKM method when errors are observed: the larger , the fewer additional ballots need to be audited if a 2-vote overstatement is observed, but the smaller is, the fewer ballots need to be audited if no 2-vote overstatements are observed. Because two-vote overstatements should be rare, taking slightly larger than 1 should suffice. For exactly equal to 1, then if the audit finds even one 2-vote overstatement, the audit will not terminate without a full hand count. [LindemanStark2012] suggest using , which makes the “cost” of a 2-vote overstatement 5 times larger than the “cost” of a 1-vote overstatement, where “cost” means the number of additional ballots that must be audited to attain the risk limit. Any value of gives a risk-limiting audit, but must be chosen before any ballots have been audited.
3.2 Ballot-polling RLA using BRAVO
The BRAVO ballot-polling RLA by [Lindeman2012] can be applied immediately to constituencies in India. In the Indian scenario, we have only one winner per constituency and one candidate per ballot. For each loser , the null hypothesis states that did not get more votes than , that is, that the true outcome was a tie or that actually won. BRAVO uses Wald’s Sequential Probability Ratio Test [Wald1945] to test all the null hypotheses simultaneously.55footnotemark: 5
The audit begins by choosing the risk limit , the maximum probability of stopping short of a full manual tally if a full manual tally would show a different electoral outcome. It also requires the reported vote totals for each candidate,666There are other ballot-polling methods that do not use the reported results at all. but no other data from the voting system.
For every apparent loser , define the conditional vote share :
Here, and are the reported vote totals for the winner and the loser respectively. If the reported vote tally is correct, the chance that a randomly selected ballot shows a vote for , given that it shows a vote for either or , is .
BRAVO maintains a test statisticfor each reported (winner, loser) pair. In Indian elections, there is only one reported winner per constituency, so this amounts to a test statistic for each reported loser . Null hypothesis is rejected if:
If the null hypotheses for all apparent losers are rejected, we stop the audit and declare the announced outcome correct.
At any time, for example if the audit is expected to take more time than simply counting the ballots, we can stop the audit and perform a full manual recount. The algorithm runs as follows:
At any stage, is a conservative sequential -value for the hypothesis that the reported winner did not actually win the constituency.
3.2.1 Number of votes to be audited
Consider an example of a 3-candidate contest with a single plurality winner. The candidates are Ramith, Shyam and Priya. Their respective shares is recorded in the following table:
In this case, the winner is Priya. Let us denote the winner-loser pairs as for Priya and Ramith and for Priya and Shyam. The weighted vote shares are:
We set the risk limit at . Every time the audit selects a ballot that shows a vote for Priya we multiply by and by . Therefore, the minimum sample size to attain a risk limit satisfies
The smallest such is . Hence, we need to audit at least 14 ballots—if they all show up votes for Priya, BRAVO will confirm the election outcome at risk limit 5%.
If the reported election results were accurate, on average we would see of ballots for Priya, for Shyam and for Ramith. [Lindeman2012] describe how to find the Average sample number (ASN), the expected sample size necessary to reject all the null hypotheses, assuming the reported results are indeed correct. Stark’s online ballot-polling tool shows an ASN of 123 for this example.
3.3 Improved methods for single-constituency RLA’s
There have been numerous improvements to the efficiency of Risk-Limiting Audits, any of which could easily apply to India’s simple electoral system. See for example https://github.com/pbstark/S157F17/blob/master/kaplanWald.ipynb and https://github.com/pbstark/S157F17/blob/master/pSPRTnoReplacement.ipynb.
The next section explains how to audit the overall parliamentary winner by an efficient combination of single-constituency audits. It requires independent, sequentially valid -values for the hypotheses that the reported outcome in constituency is incorrect. It does not require the -values to be obtained using the same method. For instance, some constituencies could use ballot polling and others could use transitive ballot-level comparison audits.
4 Auditing the overall parliamentary winner
A party or a coalition needs a majority of the seats in the Lower House of Parliament to form a new government. The total number of seats is 543, so to win, a party or coalition needs at least 272 seats. The audit needs to confirm that the reported winning party or coalition truly won at least 272 seats. (The particular seats the reported winner won is immaterial to whether they won overall.) If party supposedly won constituencies, then for a different party to have won in fact, the reported outcome must be wrong in at least of the constituencies that supposedly won. Note that this condition is necessary but not sufficient for the parliamentary outcome to be wrong: if in fact won some constituencies it was reported to have lost, the outcome could be wrong in constituencies supposedly won and yet could still be the overall winner. However, if the audit provides strong evidence that there is no set of constituencies reportedly won for which did not actually win, must be the overall winner.
Let denote the set of constituencies reportedly won. Then and , where denotes the cardinality of the set . If there is no set of constituencies with for which lost in every , really won overall.
Let denote the overall risk limit, and let denote a -value for the hypothesis that the reported outcome in constituency is wrong. We suppose that the audits in different constituencies rely on independently selected random samples of ballots, so the -values
are independent random variables. If the reported outcome in constituency
is incorrect, the probability distribution of
is stochastically dominated by a uniform distribution. That is,if the reported outcome in constituency is wrong.
Fisher’s combining function for a set of -values is
If the -values are independent and all the null hypotheses are true, the probability distribution of
is stochastically smaller than a chi-square distribution withdegrees of freedom.777See, e.g., [ottoboni2018risk]. That is, if the reported outcome in every constituency is wrong,
where is the quantile of the chi-square distribution with degrees of freedom.
The overall strategy for auditing the parliamentary outcome is thus as follows:
Select an overall risk limit for the parliamentary outcome.
denotes the constituencies the reported winning party allegedly won
denotes the set of all subsets of with cardinality
denotes the quantile of the chi-square distribution with degrees of freedom
for any set of constituencies, , where is the -value of the hypothesis that the reported winner in constituency did not really win, based on the audit sample selected from constituency so far. Before any data are collected from , .
For any collection of sets of constituencies, define , all sets of constituencies in that contain .
Initialization: Set . Select an initial sample size for each constituency , and draw the initial sample. It is permissible, but not advisable, to let in any constituency .
Audit: While is not empty:
If there is any for which every has been fully hand counted and the hand count has shown that the reported winner was incorrect in every , stop and perform a full hand count of the entire election.
For every constituency that has been fully hand counted, if the hand count confirms the reported outcome in ,
Increase in one or more constituencies and inspect the additional ballots.888The rule for increasing sample sizes could be as simple as “increase every by 25%,” or it could be designed to minimize the expected total amount of auditing required, for instance, by preferentially increasing the sample size in constituencies with large margins and taking into account differences in auditing methods in different jurisdictions (ballot polling versus transitive ballot-level comparison). All else equal, when the outcome is correct, auditing an additional ballot is expected to decrease the -value more the larger the true margin is. Similarly, all else equal, auditing an additional ballot in a jurisdiction conducting a transitive ballot-level comparison RLA is expected to decrease the -value more than auditing an additional ballot in a jurisdiction conducting a ballot-polling RLA.
If the loop terminates with , the audit has confirmed the parliamentary outcome at risk limit .
Proof that the algorithm above is an RLA for the parliamentary outcome. We need to show that if the parliamentary outcome is wrong, the chance that the audit stops without a full manual tally of every constituency is at most . If the parliamentary outcome is wrong, the reported winner is wrong in every for some . Suppose there is such a . If the audit leads to hand counting every , step (4a) ensures that there will be a full hand count of the entire election. Therefore, there will be a full hand count unless is removed from . There are two places that sets of constituencies can be removed from : step (4b) and step (4c). Step (4b) cannot remove from , because, by assumption, handcounting any would not confirm the reported outcome in . Therefore, the chance that is not fully hand counted is at most the chance that step (4c) removes from . But, by construction (through Fisher’s combining function applied to the independent constituency-level -values), that chance is not larger than . If there is more than one for which every reported outcome is wrong, the audit must erroneously remove all of them at step (b). But the chance of erroneously removing all of them cannot be larger than chance of removing any one of them individualy, which is in turn at most .
5 Conclusion and future work
We have presented an approach to conduct risk-limiting audits of the national outcome of Indian elections by combining audits conducted in different constituencies using independent samples. Within a given constituency, the audit could use ballot polling, or—with an initial step of sorting VVPATs—transitive ballot-level comparisons. The -values in different constituencies are combined using Fisher’s combining function, for a collection of sets of constituencies. The collection is constructed in such a way that for the reported parliamentary outcome to be wrong, the reported outcome must be wrong in every constituency in at least one of the sets. If there is strong statistical evidence that there is no set of constituencies in the collection for which every reported outcome is wrong, that confirms the national parliamentary outcome. The multi-level structure of Indian parliamentary elections makes it possible to have high confidence in the overall parliamentary outcome without necessarily auditing every constituency to a low risk limit. In future research we will address how to schedule increases in sample sizes in different constituencies to minimize the total expected workload, taking into account the reported margins in different constituencies and the auditing methods used in different constituencies.
Many thanks to Archanaa Krishnan, Chittaranjan Mandal, Sandeep Shukla, Peter Stuckey and Poorvi Vora for valuable suggestions on this work.