Instant Runoff Voting (IRV) is a system of preferential voting in which voters rank candidates in order of preference. IRV is used for all parliamentary lower house elections in Australia, parliamentary elections in Fiji and Papua New Guinea, presidential elections in Ireland and Bosnia/Herzogovinia, and local elections in numerous locations world-wide, including the UK and United States. Given candidates , , , and , each vote in an IRV election is a (possibly partial) ranking of these candidates. A vote with the ranking  expresses a first preference for candidate , a second preference for , and a third for . The tallying of votes proceeds by distributing each vote to its first ranked candidate. The candidate with the fewest votes is eliminated, with their votes redistributed to subsequent, less preferred candidates. Elimination proceeds in this fashion, until a single candidate remains, who is declared the winner.
The scanning and digitisation of ballots, and the use of automated counting software for computing the outcomes of elections, is becoming more commonplace. In light of recent attempts by foreign powers to interfere in electoral processes in the US , there is a growing need for efficient and statistically sound electoral audits. Risk Limiting Audits  (RLAs) provide strong statistical evidence that the reported outcome of an election is correct, or revert to a manual recount if it is wrong. The probability that the audit fails to detect a wrong outcome is bounded by a risk limit. An RLA with a risk limit of 1%, for example, has at most a 1% chance of failing to detect that a reported election outcome is wrong. In this paper we present several methods for undertaking both ballot-polling and ballot-level comparison RLAs of IRV elections, and compare the auditing effort required by each on a set of real IRV election instances. We show, in this paper, that we can design risk limiting audits for IRV elections that, in general, require only a small fraction of cast ballots to be sampled.
Risk limiting ballot-polling and ballot-level comparison audits have been developed for first-past-the-post (FPTP) or -winner plurality elections [8, 19]. In such elections, the candidates with the most votes are declared winners. A ballot-polling RLA of such an election randomly samples the paper ballots cast (or records produced). As each ballot is examined, we update a series of statistics representing our hypotheses that any loser actually received more votes than any winner. Once we have seen enough ballots to be confident that all these hypotheses can be rejected, the reported outcome is correct and the audit concludes. At any point, the audit can fall back to a full manual recount, for example if it is taking too long or has examined a large number of ballots. It is designed so that the probability of concluding with acceptance, when the result is in fact wrong, is at most . Ballot-level comparison RLAs are applicable in settings where paper ballots have been scanned and digitised, or a paper-based electronic voting system has been used, producing an index that allows individual electronic ballots to be matched to the paper ballot they represent. Each sampled ballot is compared to its corresponding electronic record. An erroneous ballot is one that does not match its electronic record. These errors are then used to update a similar set of statistics representing our hypotheses that the reported election outcomes are actually wrong.
In this paper we present several methods for undertaking ballot-polling RLAs of IRV elections, by adapting a ballot-polling RLA method (BRAVO) designed for first-past-the-post (FPTP) or -winner plurality elections . The most straightforward of these methods views, and audits, each round of an IRV election as a multiple-winner plurality election. A more efficient method, requiring fewer ballot samples, seeks to prove that the reported winner could not have been eliminated before any other candidate. The former approach is designed to confirm the correctness of each elimination in the IRV counting process, while the latter aims to to confirm only that the reported winner of the election is correct. We also adapt a comparison-based RLA method , similarly designed for FPTP elections, to IRV. We describe, and evaluate, each of the auditing methods previously discussed in the comparison audit context. We compare the relative efficiency of ballot-polling and comparison-based RLAs, in terms of the level of auditing effort required, on a suite of IRV elections. As for FPTP elections, comparison RLAs require fewer ballot samples, in general, to confirm the correctness of an election result. This is because they can assess the differences between reported and actual individual ballots, a significant source of extra information that becomes particularly advantageous when the margin is small.
We present an algorithm, audit-irv, for generating a collection of facts to audit for a given IRV election that, if confirmed with a given degree of statistical confidence, confirms that the reported winner of the election is the correct winner to the given degree of statistical confidence. The audit-irv
algorithm finds the set of such facts that require the least estimated level of auditing effort to prove. We apply this algorithm to generate efficient ballot-polling and comparison RLAs for IRV elections. We experimentally consider audits where all votes are indeed recorded correctly, and elections where discrepancies exist between paper ballots and their electronic records.
The contributions of this paper are:
The first use of ballot-polling for IRV elections beyond the simplest approach of treating each round in the IRV election as a FPTP election previously described (but not evaluated) by Sarwate et al .
The first use of ballot-level comparison auditing for IRV elections.
An efficient algorithm for determining a small set of facts to be verified by either a ballot-polling or ballot-level comparison audit in order to be confident within some given risk limit that the winner of an IRV election is the correct winner.
This paper is structured as follows. Related work is described in Section 2. Required background and definitions are presented in Section 3. Section 4 describes the ballot-polling  and ballot-level comparison-based RLAs  upon which our IRV audits are based. We present our IRV ballot-polling and comparison-based RLAs in Sections 5 and 6, and evaluate their efficiency in Section 7.
2 Related Work
Post-election audits are a key measure for increasing both security in our electoral systems, and public confidence in the integrity of our elections . Risk-limiting audits of reported election results against paper ballots (or records) represent the current best-practice for verifying the integrity of an election , and a central component of conducting evidence-based elections .
There is a growing literature on the use of risk-limiting audits for auditing the outcome of varying types of election (see e.g. [15, 6, 5, 3, 8, 16, 14]). Risk-limiting audits (RLAs) have been applied to a number of plurality (first-past-the-post) elections, including four 2008 elections in California  and elections in over 50 Colorado counties in 2017 . Stark and Teague  present RLAs for D’Hondt (and similar) elections, applicable in a number of European countries such as Norway, Germany, Luxembourg, and Denmark. General auditing procedures designed to enhance electoral integrity have been outlined by Antoyan et al . The BRAVO ballot-polling risk-limiting audit , designed for FPTP elections, forms the basis of our IRV ballot-polling audits. The ballot-level comparison RLA presented by Stark  forms the basis of our IRV comparison RLAs.
A straightforward RLA of an IRV election can be conducted by treating each IRV round as a separate FPTP election. This idea was described by Sarwate et al  although not explored in any detail. Sarwate et al  consider two additional approaches for designing a comparison audit of an IRV election. The first of these methods determines whether replacing an erroneous ballot with its correct representation changes the margin of victory of the election. The second samples ballots and checks whether the number of erroneous ballots in the sample exceeds a threshold whose value is based on the election’s margin of victory. We demonstrate, however, that we can more efficiently audit an IRV election outcome by simply verifying that the reported winner was not defeated by any other candidate.
In a first-past-the-post (FPTP) election, a voter marks a single candidate on their ballot when casting their vote. The candidate who receives the most votes is declared the winner. The BRAVO ballot-polling risk limiting audits , and the MACRO ballot-level comparison risk-limiting audits , are designed for -winner FPTP contests. A voter may vote for up to of the candidates on their ballot, and the candidates with the highest number of votes are declared winners. IRV, in contrast, is a form of preferential voting in which voters express a preference ordering over a set of candidates on their ballot. The tallying of votes in an IRV election proceeds by a series of rounds in which the candidate with the fewest votes is eliminated (see Figure 1) with the last remaining candidate declared the winner. All ballots in an eliminated candidate’s tally are distributed to the next most-preferred (remaining) candidate in their ranking.
Let be the set of candidates in an IRV election . We refer to sequences of candidates in list notation (e.g., ), and use such sequences to represent both votes and elimination orders. An election is defined as a multiset111A multiset allows for the inclusion of duplicate items. of ballots, each ballot a sequence of candidates in , with no duplicates, listed in order of preference (most preferred to least preferred). Throughout this paper we use the notation to denote the first candidate in a sequence . In each round of vote counting, there are a current set of eliminated candidates and a current set of candidates still standing . The winner is the last standing candidate.
Projection We define the projection of a sequence onto a set as the largest subsequence of that contains only elements of . (The elements keep their relative order in ). For example:
Each candidate has a tally of ballots. Ballots are added to this tally upon the elimination of a candidate , and are redistributed upon the elimination of .
Tally Given candidates are still standing in an election , the tally for a candidate , denoted , is defined as the number of ballots for which is the most-preferred candidate of those remaining. Recall that denotes the sequence of candidates mentioned in that are also in .
The primary vote of candidate , denoted , is the number of ballots for which is ranked highest. Note that .
Consider the IRV election of Table 1. The tallies of and , in the counting round are 26000, 10000, 9000, and 15000 votes. Candidate is eliminated, and 9000 ballots are distributed to , who now has a tally of 24000. Candidate , on 10000 votes, is eliminated next with 6000 of their ballots given to (the remainder have no subsequent preferences and are exhausted). Candidates and remain with tallies of 26000 and 30000. Candidate is eliminated and elected. ∎
4 Risk-limiting audits for FPTP
The aim of a risk limiting audit is to either gain evidence that the reported results are correct (to some risk limit ) or to correct an incorrect result by falling back to a manual recount. To this end we will consider two versions of the statistics defined in the previous section. We use the regular definition for the recorded values made during the election, and add a tilde to mean the actual values which should have been calculated, as represented in the paper record. Hence is the recorded primary vote for candidate and is the actual primary vote for the candidate.
For now we consider a simple -winner from candidates FPTP election where the candidates who have the greatest number of votes are elected. All winners are elected simultaneously and there is no transfer of votes. Given a set of candidates ( there will be a set of winners () and losers ().
We now present the BRAVO algorithm  for ballot-polling risk-limiting audits of such elections (Figure 2) and a similar algorithm for conducting a risk-limiting ballot-level comparison audit (Figure 3), adapted from the MACRO algorithm . Both methods are applicable in elections where each ballot may express a vote for one or more candidates. For our proposed IRV audits, we apply BRAVO and the ballot-level comparison RLA in contexts where each ballot represents a vote for a single candidate only (i.e., in any round of an IRV count, each ballot belongs to the tally of no more than one candidate). We describe the BRAVO and ballot-level comparison RLA algorithms in the context where each ballot is equivalent to . Then is the tally of votes for each candidate .
4.1 BRAVO: Ballot-polling Risk-Limiting Audits
A BRAVO audit independently tests null hypotheses for each winner/loser pair, representing the hypothesis that actually beat . A statistic for each test is updated when a ballot is drawn for either its winner or its loser.
Given an overall risk limit we can estimate for each hypothesis the number of sampled ballots we expect will be required to reject the hypothesis assuming the announced election counts are perfectly accurate. Let be the proportion of recorded votes for candidate , i.e. . Let be the proportion of recorded votes for the winner of the votes for the winner and loser, . Clearly . Then the Average Sample Number (ASN) for BRAVO, that is the expected number of samples to reject the null hypothesis assuming the recorded counts are correct, is given by:
Consider the first round of the IRV election of Example 1. If we view this first round as a FPTP election with winners , , and , and loser , the null hypotheses we need to reject are , , . We calculate , , , and , , and . The ASN for rejecting each hypothesis using BRAVO, assuming , is 44.5, 6885, and 246 respectively. ∎
4.2 Ballot-level Comparison Risk-Limiting Audits for FPTP
Stark  presents a method for conducting a ballot-level comparison RLA of a collection of FPTP contests or races simultaneously. In this section, we describe this audit in the context of a single race, where each ballot records a vote for a single candidate. This audit randomly samples ballots from the set and finds the matching electronic records for those ballots in the set . For each ballot, we compare the actual () and recorded () representations. We assess any differences in these representations in terms of the extent to which the error overstated a pairwise margin between a winning and losing candidate. The procedure followed in this comparison RLA is shown in Figure 3.
We denote the algorithm shown in Figure 3 as MACRO. For each sampled ballot, we compute its maximum across-contest relative overstatement (MACRO [17, 18]) in the single-contest setting. In an election with winners , and losers , the MACRO for a ballot is given by:
where: is 1 if is a recorded vote for candidate , and 0 otherwise; is 1 if is an actual vote for candidate ; and the pairwise margin (difference in recorded tallies) between candidates and .
As each ballot is sampled, we multiply a running Kaplan-Markov MACRO P-value () as follows :
where: is the smallest recorded margin between a winning and losing candidate; ; is a parameter used to inflate the upper bound on errors for each ballot (see Stark  for a description of the role and importance of the ‘inflator’ ); and is defined as per Equation 4. We continue to sample ballots until either a maximum number of ballots have been checked (indicating that a full recount is required), or our statistic falls below our risk limit .
Given an overall risk limit , we can estimate the number of ballots that must be sampled by such an audit under the assumption that no errors are present in the electronic ballot records. We reuse the terminology of ballot polling audits, and call this number of ballots the Average Sample Number (ASN) for the audit. Given an election with reported ballots , the ASN for a comparison RLA of the form shown in Figure 3, with risk limit , is defined by Stark :
Consider again the first round of the IRV election of Example 1, viewed as a FPTP election with winners , , and , and loser . This election can be audited by a single application of MACRO (Figure 3). The tallies for each candidate are shown in Table 1, column two. The margins between each winner-loser pair in this first round election are , , and . The smallest winner-loser margin is 1000. Using the formula stated in Equation 6, with and , the expected number of ballot checks required by MACRO is 395.4, with .
When auditing this first-round election, the algorithm of Figure 3 randomly draws a paper ballot and compares it to its electronic record . If and match, the computed error is equal to 0. Consider the situation in which a paper ballot with ranking [, ] has been recorded as a [, , ] ballot, with the election profile listed in Example 1 representing reported counts. To determine the impact of this erroneous recorded ballot, we look at each winner and loser pair (,), (, ), and (, ). For each winner-loser pair, we compute, and take the maximum of, the expression:
For (, ) and (, ) the numerator in the above expression is equal to 1 – the error in the reported ballot overestimated the margin between these winners and the loser by 1 vote. For pair (, ), the numerator in the above expression is equal to 2 – the error in the reported ballot overestimated the margin between and by 2 votes. For this ballot, . ∎
5 Risk-Limiting Audits for IRV Elections
In this section we present four different approaches for conducting a ballot-polling or ballot-level comparison RLA for an IRV election. The first method audits the entire elimination order, ensuring that every step in the IRV election was correct (with some confidence). The second method simplifies the auditing task in settings where we can eliminate multiple candidates in a single round. The third method seeks to examine only whether the eventual winner was the correct one. The fourth approach is a general algorithm for finding efficient ballot-polling and ballot-level comparison RLAs for IRV elections.
Each of these involves auditing simultaneously a collection of different facts, whose conjunction is what we actually want to check. In the first case, we are interested in checking the complete elimination order; in later methods we are interested in a collection of facts which, taken together, imply that the announced winner truly won. Each individual audit is conducted to the same Risk Limit . If at any point, any of the audits fail to reach a positive conclusion, we manually recount the whole election. It is easy to see that this process constitutes a valid risk-limiting audit to risk limit of the election result, assuming that our collection of chosen facts does indeed imply that that candidate won. Suppose that the announced election outcome is actually wrong. Then at least one fact in must be false. The individual audit of that fact will therefore go to a full manual recount with probability at least , at which point we hand count the whole election.
Although the risk limit is preserved, the likelihood of unnecessarily manually recounting an election that is actually correct is higher when the conjunction of many facts is being checked.333Thanks to Damjan Vukcevic for pointing this out.
5.1 Auditing a particular elimination order
The simplest approach to applying risk limiting auditing to IRV is to consider the IRV election as a number of simultaneous FPTP elections, one for each IRV round. This was suggested by Sarwate et al , although they do not explore it algorithmically. Note that this may perform much more auditing than required, since it verifies more than just that the eventual winner is the correct winner, but that every step in the IRV election was correct (with some confidence).
Given an election of candidates , we define the computed elimination order as where is the first eliminated candidate, the second, etc, and the eventual winner.
We treat each IRV round as a FPTP election. In round , we have a set of winning candidates (, the candidates that are still standing after round ) and a single losing candidate (, the candidate eliminated in round ). More formally, the set of candidates in the round FPTP election is . Each candidate has a recorded tally of . The loser of this election is and the set of winners denoted by .
5.1.1 Auditing the elimination order by ballot-polling
We can audit all these FPTP elections simultaneously by considering all the null hypotheses that would violate the announced result. These are . We represent these hypotheses by a pair of winner , and loser . The statistic maintained for this test is . Each loser loses in only one round so there is no ambiguity.
The algorithm is shown in Figure 4. The set of hypotheses are again pairs of winner and loser , but are interpreted as a hypothesis for the FPTP election corresponding to the round where was eliminated. The calculation of the expected ratio of votes must be made using the tallies from this round, and we must consider every ballot to see how it is interesting for that particular hypothesis. A ballot that is exhausted after rounds, for example, will not play a role when determining the statistics for later round hypotheses.
Consider the IRV election shown in Example 1. The null hypotheses we need to reject are , , and from the first round election, and from the second round election and from the final round. Assuming the ASNs for the first round are the same as calculated in Example 2. The ASNs required to disprove the two stated null hypotheses for the second round election are 51.8 and 64.0. The ASN required to disprove the final round null hypothesis is 1186. The ASN of the overall audit is the maximum of the ASNs required to disprove all null hypotheses, across each round. For this election, this ASN is 6885. ∎
The weakness of this naive approach is that inconsequential earlier elimination rounds can be difficult to audit even if they are irrelevant to the winner.
Consider an election with candidates and ballots , , , , , . The elimination order is . Given , rejecting the null hypothesis that beat in the first round gives an ASN of indicating a full hand audit is required. But it is irrelevant to the election result. ∎
5.1.2 Auditing the elimination order by a comparison audit
Each of these FPTP elections can also be audited via a single application of MACRO (Figure 3) with , , and appropriate instantiations of the risk limit and inflator parameters. As in the ballot-polling context, we can audit each of these FPTP elections simultaneously. In contrast to the ballot-polling audit, we need only perform a single comparison RLA (using MACRO) for each IRV round. Our ballot-polling audit, in contrast, applies BRAVO to each of a series of hypotheses in each round (one for each winner-loser pair).
Consider again the IRV election shown in Example 1. To audit the entire elimination order with a comparison audit, we treat each IRV round as an FPTP election and run a MACRO audit. Assuming , and , the expected number of ballot checks required by MACRO is the same as that calculated in Example 3. For the remaining two IRV rounds, the ASNs required by MACRO are 28.2 and 98.9. The ASN of the overall audit is the maximum of the ASNs required by MACRO in each round. For this election, this ASN is 395.4. In this case, auditing the entire elimination order by a comparison audit is likely to be more efficient than a corresponding ballot-polling audit. ∎
5.2 Simultaneous elimination
It is common in IRV elections to eliminate multiple candidates in a single round if it can be shown that the order of elimination cannot affect later rounds. Given an elimination order we can simultaneously eliminate candidates if the sum of tallies of these candidates is less than the tally of the next lowest candidate. Let be the set of candidates standing after the first have been eliminated. We can simultaneously eliminate if:
This is because no matter which order the candidates in are eliminated no candidate could ever garner a tally greater than one of the candidates in . Hence they will all be eliminated in any case. As the remainder of the election only depends on the set of eliminated candidates and not their order, the simultaneous elimination can have no effect on later rounds of the election.
We can model the simultaneous elimination for auditing by considering all the simultaneously eliminated candidates as as single loser . Like the audit of a particular elimination sequence, we are proving a stronger result than necessary, i.e. that a particular sequence of (possibly multiple) eliminations is valid, though there may be another way of getting the same candidate to win even if the multiple elimination isn’t correct.
This often results in a much lower ASN, though not necessarily: sometimes the combined total of first preferences in is very close to the next tally, so a lot of auditing is required. It may be better to audit each elimination individually in this case. It is possible to compute the ASN for each approach and choose the method that requires the least auditing, assuming the outcome is correct.
5.2.1 Simultaneous elimination by ballot-polling
We want to reject hypotheses for each . The statistic in this case is increased when we draw a ballot where is the highest-ranked of remaining candidates , and decreased when we draw a ballot where is the highest-ranked of remaining candidates .
The elimination of all these null hypotheses is sufficient to prove that the multiple elimination is correct. This can then be combined with the audit of the rest of the elimination sequence, as described in Section 5.1, to test whether the election’s announced winner is correct.
Consider the election in Example 5. We can multiply eliminate the candidates since the sum of their tallies which is the lowest tally of the other candidates (, , and ). If we do this the difficult first round elimination auditing disappears. This shows the benefit of multiple elimination. The ASNs required for the joint elimination of are 17.0, 36.2 and 49.1 as opposed to requiring a full hand audit.
After this simultaneous elimination, the tallies for the three candidate election are , and and the ASNs to reject the hypotheses and are 77.6 and 1402 respectively.
We could also simultaneously eliminate since the sum of their tallies which is the lowest tally of the other candidate (that of ). But this will lead to a very difficult hypothesis to reject, since the tallies are almost identical! The ASN is 158,156,493! This illustrates that simultaneous elimination may not always be beneficial. ∎
5.2.2 Simultaneous elimination by a ballot-level comparison RLA
As in the ballot-polling context, we treat any simultaneously eliminated candidates as a single loser , eliminated in a single round . We treat each round as a FPTP election, audited via a single application of MACRO.
Consider again the election in Example 5, in the setting where we simultaneously eliminate candidates in the first round. When viewed as a single losing candidate , the winner-loser pairwise margins in this first round FPTP election are , , and . Assuming and , the expected number of ballot checks required by MACRO to audit this first round FPTP election is 36.2. In the second round FPTP election, candidates , and remain with winners and loser . The winner-loser pairwise margins in this election are and , with . The expected number of ballot checks required by MACRO to audit this election is 145. In the final round election, our winner is and loser , with . The expected number of ballot checks required by MACRO to audit this election is 48.3. The overall ASN for the comparison audit, given simultaneous elimination of candidates and , is 145. This is less than that of the ballot-polling variant at 1402. ∎
5.3 Winner only auditing
The above two methods consider auditing the entire IRV process to ensure that we are confident on all its outcomes – i.e., that the correct candidate was eliminated in each round. This is too strong since even if earlier eliminations happened in a different order it may not have any effect on the eventual winner.
Consider an election with ballots , and . No simultaneous elimination is possible, and auditing that is eliminated before will certainly require a full hand audit. But even if were eliminated first it would not change the winner of the election. ∎
5.3.1 Winner only auditing via ballot-polling
An alternate approach to ballot-polling RLAs for IRV elections is to simply reject the null hypotheses where is the declared winner of the IRV election, and . This hypothesis states that gets more votes than where is given the maximal possible votes it could ever achieve before is eliminated, and gets only its first round votes (the minimal possible votes it could ever hold). When we reject this hypothesis we are confident that there could not be any elimination order where is eliminated before . If all these hypotheses are rejected then we are assured that is the winner of the election, independent of a particular elimination order.
Consider the election of Example 9. We must reject the hypotheses that ( is eliminated before ) and ( is eliminated before ). The primary vote for is 10000, while the maximum tally that can achieve before is eliminated is 6000. The maximum tally that can achieve before is eliminated is 5999. Auditing to reject these hypotheses is not difficult. The ASNs are 98.4 and 98.3 ballots.
Note that if the ballots were changed to , the maximum tally that can achieve is 12000, and the hypothesis that ( is eliminated before ) could not be rejected. In this case just changing a single vote could result in winning the election, so this election will need a full recount. ∎
5.3.2 Winner only auditing via a ballot-level comparison RLA
The ballot-level comparison RLA version of the winner only audit proceeds in a similar fashion to its ballot-polling counterpart. Given a election with winner and losers , the ballot-polling audit executes a BRAVO audit for each winner-loser pair, where . In each of these audits, is awarded only their first preference votes , while is awarded all votes in which they appear before , or where they appear and does not . This audit is designed to disprove the null hypothesis that . In the ballot-level comparison RLA version of this audit, we apply the MACRO algorithm of Figure 3, in place of BRAVO, for each winner-loser pair , with and .
Consider the election of Example 9. For winner-loser pair , we apply MACRO to an election with winner , and loser , where has a tally of 10000 votes and a tally of 5999 votes. Even though appears before in the [, , ] ballots, we only award with its first preference votes in a winner only audit. If the positions of and were swapped in these ballots, these ballots would be treated as votes for . In this election, and we expect to check 36.2 ballots. For winner-loser pair , we apply MACRO to an election with winner , and loser , where has a tally of 10000 votes and a tally of 6000. The ASN for this election is also 36.2 ballots. ∎
6 A general algorithm for finding efficient ballot-polling or comparison RLAs for IRV
In each of the ballot-polling and ballot-level comparison RLAs for IRV described in the preceding sections, we apply an existing risk limiting audit (BRAVO, as per Figure 4, or MACRO, as per Figure 3) to confirm a collection of facts with a given level of statistical confidence. In the case of a winner only audit, for example, we are seeking to confirm that the reported winner could not have been eliminated before any one of the reported losers . This results in facts to be confirmed, one for each winner-loser pair.
For each fact that we seek to confirm, we can estimate the number of ballots that must be checked via an application of BRAVO or MACRO, assuming no errors are found. We present a general algorithm for choosing the set of facts that can be checked most efficiently to confirm that the reported winner was the correct one. The algorithm, denoted audit-irv, achieves this by finding the easiest way to show that all election outcomes in which a candidate other than won, did not arise, with a given level of statistical confidence, for a given method of auditing each fact. The audit-irv algorithm can be applied to generate either a ballot-polling or a ballot-level comparison RLA for an IRV election.
Note that our risk-limit follows directly from BRAVO and MACRO: if the election outcome is wrong, then one of the facts in must be false—a BRAVO or MACRO audit with risk limit will detect this with probability at least , and we then manually recount the whole election. However, our estimate of efficiency
is only heuristic: ASNs for testing a single fact can be derived analytically, but the expected number of samples required to reject multiple hypothesis at once is very hard to compute, even if there are no discrepancies. We make a best guess based on the maximum ASN for any single fact—this is what we meant by “optimal” in this section, though it may not guarantee an optimally efficient audit overall.
Our algorithm, audit-irv, outlined in Figure 5, explores the tree of alternate elimination sequences, ending in a candidate . Each node is a partial (or complete) elimination sequence. For each node , we consider the set of hypotheses that (i) can be proven with an application of BRAVO or MACRO and (ii) any one of which disproves the outcome that represents. We label each node with the hypothesis from this set that requires the least number of anticipated ballot samples (ASN) to prove, denoted . We use the notation and to represent the hypothesis assigned to and the ASN for this hypothesis, respectively. Our algorithm finds a set of hypotheses to prove, denoted audits, that: validates the correctness of a given election outcome, with risk limit ; and for which the largest ASN of these hypotheses is minimised. When performing a ballot-polling audit, we compute this ASN via Equation 3, and Equation 6 when generating a comparison RLA.
Consider a partial elimination sequence of at least two candidates, leading to an alternate winner . This sequence represents the suffix of a complete order – an outcome in which the candidates in have been previously eliminated, in some order. We define a function FindBestAudit(, , , [, ]) that finds the easiest to prove hypothesis (or fact) , with the smallest ASN, which disproves the outcome given risk limit . The parameter is only used when generating a ballot-level comparison RLA to audit the given election. For the outcome , FindBestAudit considers the following hypotheses:
Hypothesis that beats , for some , in a winner only audit of the form described in Section 5.3, with winner and loser , thus invalidating the sequence since cannot be eliminated before ;
Hypothesis that beats in a winner only audit with winner and loser , thus invalidating the sequence since cannot be eliminated before ;
Hypothesis that beats some in a BRAVO (or MACRO) audit with winner and loser , under the assumption that the only candidates remaining are those in (i.e. the set ) with other candidates eliminated with their votes distributed to later preferences, thus invalidating the sequence since then is not eliminated at this stage in an IRV election.
We assume that if no hypothesis exists with ASN less than the function returns a dummy INF hypothesis with .
For an election with candidates and winner , audit-irv starts by adding partial elimination orders to an initially empty priority queue , one for each alternate winner (Steps 4 to 9). The set audits is initially empty. For orders containing a single candidate , FindBestAudit considers the hypotheses WO(,), candidate beats in a winner only audit of the form described in Section 5.3, with winner and loser , for each . The hypothesis with the smallest is recorded in . The (current) best ancestor for is recorded in , for these singletons sequences it is always the sequence itself.
We repeatedly find and remove a partial sequence in for expansion (Steps 11 and 12). This is the sequence with the (equal) highest ASN. If the best ancestor for this sequence has an ASN lower than the current lower bound (Steps 13 to 16) we simply add the corresponding hypothesis to and remove any sequences in which are subsumed by this ancestor (have it as a suffix), and restart the main loop.
Otherwise (Steps 17 to 31) we create a new elimination sequence with appended to the start of () for each . For a new sequence , FindBestAudit finds the hypothesis requiring the least auditing effort to prove. We record (Step 20) this as the hypothesis for .
We calculate (Step 21) the best ancestor of by comparing the ASN for its hypothesis with that of its ancestor. If the sequence is complete, then we known one of its ancestors (including itself) must be audited. If the best of these is infinite, we terminate, a full recount is necessary. Otherwise we add the hypothesis of its best ancestor to and remove all sequences in which are subsumed by this ancestor. If the sequence is not complete we simply add it into the set of sequences to be expanded .
Consider an election with ballots , , , , , and , and candidates to . The initial tallies are: : 10000; : 6500; : 5500; : 5000. Candidates , , and are eliminated, in that order, with winner . In a ballot-polling or comparison winner only audit (), we cannot show that beats , or that beats , as ’s first preference tally (of 10000 votes) is less than the total number of ballots that we could attribute to and (11500 and 10500, respectively). Simultaneous elimination is not applicable in this instance, as no sequences of candidates can be eliminated in a group. In an audit of the whole elimination order (as per Section 5.1), the loss of to , , and is the most challenging to audit. The ASN for the ballot-polling version of this audit, assuming , is 25% of all ballots (6750 ballots). The comparison version of this audit, assuming , is 1.3% of all ballots (351 ballots).
Our audit-irv algorithm finds a set of hypotheses that can be proven using a ballot-polling audit with a maximum ASN of 1% (or 270 ballots, with ), and that consequently rule out all elimination sequences that end in a candidate other than . This audit tests the hypotheses: beats if and have been eliminated (ASN of 1%); beats if and have been eliminated (ASN 0.5%); beats in a winner only audit (ASN 0.4%); and that beats if has been eliminated (ASN 0.1%). If we instead use audit-irv to construct a ballot-level comparison audit, we find a set of hypotheses that can be tested with a maximum ASN of 0.17%. These hypotheses are: beats if and have been eliminated (ASN of 0.17%); beats if has been eliminated (ASN of 0.07%) and if and have been eliminated (ASN of 0.11%); beats in a winner only audit (ASN of 0.13%); and that both and beat if all remaining candidates have been eliminated (ASN of 0.04%). ∎
7 Computational Results
We have simulated the ballot-polling and ballot-level comparison RLAs described in Section 5.1 (auditing the elimination order, EO), Section 5.2 (auditing with simultaneous elimination, SE), and Section 5.3 (winner only auditing, WO), on 21 US IRV elections held between 2007 and 2014, and on the IRV elections held across 93 electorates in the 2015 state election in New South Wales (NSW), Australia. For each election, we have simulated each of these audits with varying risk limits ( and ), and .444We explore the influence of the parameter in subsequent experiments. We record, for each simulated audit, the number of ballots that were sampled during the audit (expressed as a percentage of ballots cast). An audit that needs to sample fewer ballots before confirming the correctness of the reported outcome, to the given degree of statistical confidence, is a more efficient audit. As each audit involves ballots being drawn at random, we simulate each audit 10 times and compute the average number of ballots checked across those 10 simulations.
All experiments have been conducted on a machine with an Intel Xeon Platinum 8176 chip (2.1GHz), and 1TB of RAM.
Table LABEL:tab:BPvsCP_EO compares the number of ballot checks required by ballot-polling and ballot-level comparison audits of the form described in Section 5.1 across our suite of election instances. The number of required ballot samples is reported alongside the ASN for each audit (computed as per Equation 3 for each ballot-polling audit, and Equation 6 for each comparison audit), and the margin of victory (MOV) for each election (computed using the algorithm of Blom et al ). Tables LABEL:tab:BPvsCP_SE and LABEL:tab:BPvsCP_WO similarly compare the number of ballot samples required by simultaneous elimination and winner-only ballot-polling and comparison audits, described in Sections 5.2 and 5.3. In this experiment, no errors or discrepancies have been injected into the set of reported ballots in each election instance ().
Tables LABEL:tab:BPvsCP_EO to LABEL:tab:BPvsCP_WO show that performing a winner only audit can be much easier than auditing the full elimination order (with or without the use of simultaneous elimination), irrespective of whether we are conducting a ballot-polling or comparison audit. This is the case for the 2013 Minneapolis Mayor and 2014 Oakland Mayor elections. In some cases, winner only audits are more challenging (or not possible) as we seek to show that a candidate (on just their first preference votes) could have beaten another (who is given all votes in which they appear before or in which they appear, but does not). Even if does beat in the true outcome of the election, this audit may not be able to prove this (see Pierce 2008 County Executive, Oakland 2012 D5 City Council, and Aspen 2009 Mayor for examples).
Auditing with simultaneous elimination (grouping several eliminated candidates into a single ‘super’ candidate) can be more efficient than auditing each individual elimination. This is evident in the context of both ballot-polling audits (see Berkeley 2010 D8 City Council, Berkeley 2012 Mayor, Oakland 2010 Mayor, San Francisco 2007 Mayor, and Sydney NSW) and ballot-level comparison audits (see Balmain NSW 2015, Sydney NSW 2015, Oakland 2010 Mayor, San Leandro 2010 Mayor, and Berkeley 2010 D8 City Council). Across the 26 election instances in Tables LABEL:tab:BPvsCP_EO and LABEL:tab:BPvsCP_SE, conducting a comparison audit with simultaneous elimination was beneficial in 15 instances and detrimental in 2. In the context of ballot-polling audits, simultaneous elimination was beneficial in 8 and detrimental in 5. In some instances, the tally of the super candidate is quite close to that of the next eliminated candidate, resulting in a more challenging audit. This is particularly evident when simulating a ballot-polling audit of the Campbelltown NSW and Berkeley 2010 D4 City Council elections.
Tables LABEL:tab:BPvsCP_EO to LABEL:tab:BPvsCP_WO show that comparison audits are generally more efficient than their ballot-polling counterparts, as they are for the underlying FPTP election. The Oakland 2012 D3 City Council election is an excellent example. Neither auditing the entire elimination sequence, the sequence with simultaneous elimination, or conducting a winner-only audit, is successful in the ballot-polling context. The ASN is more than the total number of ballots in each case. We can conduct a comparison audit, using each of these methods, however, that requires only a fraction of cast ballots to be sampled (23% or 6155 ballots, 23%, and 0.1% or 268 ballots, when auditing the entire elimination order, auditing with simultaneous elimination, and conducting a winner-only audit, respectively). For each simulated audit, increasing the risk limit reduced the average number of required ballot samples, as expected.
Table LABEL:tab:AUDITIRV-NOERRORS reports the average number of ballots examined by the ballot-polling and ballot-level comparison audits generated by our audit-irv algorithm across the 26 considered IRV elections (with %). We compare this level of auditing effort against the number of ballot checks required by the best alternate auditing method (auditing the entire elimination order [EO], simultaneous elimination [SE], and winner-only auditing [WO]). Recall that audit-irv finds an appropriate set of facts to audit (via ballot-polling or a comparison audit) that, if shown to hold with a given degree of statistical confidence, confirms the reported election outcome with that degree of statistical confidence. The audit-irv algorithm finds the set of such facts requiring the least anticipated number (ASN) of ballot checks to confirm. Table LABEL:tab:AUDITIRV-NOERRORS shows that while the ASN of the audit-irv audits is minimal – the actual level of auditing effort required by these audits will differ from these estimates, and may be greater than that required by an EO, SE, or WO audit. For ballot polling audits the discrepancy can be large. In these experiments we have not introduced any errors or discrepancies between the electronic ballot records and the paper ballots. In this setting, the ASN computed for a ballot-level comparison audit accurately represents the actual number of ballot checks or polls made during the audit.
In all but one of the election instances in Table LABEL:tab:AUDITIRV-NOERRORS, audit-irv is able to compute an audit configuration in less than 1 minute. The algorithm requires between 0.003s and 106s to find an audit configuration in the ballot-polling context, and 0.002s to 139s in the comparison audit context. The most time consuming instance is the 2014 Oakland Mayoral election, with audit-irv requiring 106s and 139s to find the best ballot-polling and comparison audit, respectively.
In 22/26 of the election instances of Table LABEL:tab:AUDITIRV-NOERRORS, the ballot-polling audit generated by audit-irv required a similar number of ballot samples to that of the best alternate method (EO, SE, or WO). In the remaining 4 instances, the audit-irv audit was significantly more efficient. Consider instances 12 and 13 – the Oakland 2012 City Council election for District 3, and the Pierce 2008 County Assessor election. For these instances, neither the EO, SE, or WO methods were able to audit the election without manually recounting all ballots. The audit-irv audits, however, were able to confirm the reported outcomes in these elections by sampling no more than 17% of the cast ballots, on average, when simulated. The comparison audits generated by audit-irv are significantly more efficient than their ballot-polling counterparts, across our suite of election instances.
Consider instance 17 in Table LABEL:tab:AUDITIRV-NOERRORS – the Aspen 2009 City Council election. We can, with a comparison audit generated with audit-irv, confirm the reported outcome (with risk limit %) by sampling just under 10% of the cast ballots (254 ballots), on average. If we were to use one of the EO, SE, or WO approaches of conducting a comparison audit, in place of the audit generated by audit-irv, we would need to sample just under 83% of the cast ballots (2112 ballots), on average. Table LABEL:tab:AUDITIRV-NOERRORS also shows that as the parameter increases, the number of ballots checked in a comparison audit may increase slightly, but not significantly.
We have shown that audit-irv is able to find efficient ballot-polling and ballot-level comparison audit configurations across a range of election instances, in the context where electronic ballot records exactly match their corresponding paper ballot. We now consider the effectiveness of our audits in the setting where varying numbers of errors (or discrepancies) are introduced into the reported (electronic or digitised) ballot records. We introduce discrepancies between reported and actual ballots according to a defined error rate, which we vary between 1% and 10%. This means that for any given ballot, there is a 1% to 10% probability that its electronic version differs, in some way, from the paper version. Recall that the electronic record of a paper ballot is a partial or complete sequence of candidates, ordered according to voter preference. We introduce an error in a reported ballot record with one of the following operations: replacing a randomly selected candidate in this preference ordering with a randomly selected candidate that does not appear in the ordering; inserting a randomly selected candidate that does not appear in the ordering into a randomly selected position; flipping the positions of two randomly selected candidates in the ordering; removing a randomly selected candidate in the ordering. For each reported ballot, we roll a die to determine whether to introduce an error. When introducing an error, we uniformly randomly choose one of the above manipulations to perform.
In this setting, we simulate each auditing approach 50 times – with 10 different seeds used to inject errors into electronic (reported) ballot records, and 5 seeds used to randomly draw (sample) ballots during the audit. When reporting the ASNs and actual number of ballots sampled by each auditing method, we average these values over the 50 simulated audits. Tables LABEL:tab:AUDITIRV-1PCERRORS to LABEL:tab:AUDITIRV-10PCERRORS report the ASN and actual number of ballot samples required, on average, across the simulation of varying types of audit in each of our 26 election instances, with a 1% to 10% error rate, %, and . We compare the EO, SE, and WO auditing methods, in both a ballot-polling and comparison audit context, against the audits generated by audit-irv. Tables LABEL:tab:AUDITIRV-1PCERRORS to LABEL:tab:AUDITIRV-10PCERRORS show that even when there are discrepancies between actual and reported ballots: comparison audits are still more efficient, in general, than ballot-polling audits; and audit-irv is able to generate efficient audits that sample only a small fraction of cast ballots.
As the rate of introduced errors increases toward 10%, the ASNs associated with the comparison audits generated by audit-irv significantly underestimate the actual auditing effort required in a small number of instances. This is the case in instances 10 (Pierce 2008 County Executive), 13 (Pierce 2008 County Assessor), 14 (San Leandro 2010 Mayor), 17 (Aspen 2009 City Council), 18 (Oakland 2010 Mayor) and 24 (Gosford NSW 2015). The MOV in each of these elections is less than 1% of the total ballots cast. Our results indicate that for very close elections, with a very small margin of victory, the impact of each discrepancy encountered in the sampling of ballots has a significant influence on the statistics being maintained throughout the comparison audit. Recall that the MACRO algorithm of Figure 3 repeatedly samples ballots until a running Kaplan-Markov MACRO P-value () falls below the given risk limit . When we discover a discrepancy that has resulted in the margin between a winning and losing candidate being overstated (i.e., thought to be larger than it actually is), this statistic increases at a rate that is proportional to the inverse of the election MOV. For elections with a very small MOV, each discovered error may significantly increase the ASN of the audit. In these instances, a full manual recount is likely to be required (and indeed, the announced outcome may be wrong).
We have presented and evaluated several methods for conducting ballot-polling and ballot-level comparison RLAs for IRV elections. These approaches represent the first practical techniques for conducting RLAs for IRV. As in FPTP, we find that comparison-based IRV audits are, in general, more efficient than their ballot-polling counterparts. These audits typically require only a small fraction of cast ballots to be sampled, though very close elections (with a MOV that is less than 1% of cast ballots, for example) generally require a full manual recount. We have presented an algorithm, denoted audit-irv, for designing efficient ballot-polling and ballot-level comparison-based RLAs for a given IRV election. This algorithm finds a collection of facts to audit that require the least number of expected ballot checks to confirm (assuming the announced outcome is correct), while still guaranteeing that a wrong result with be detected with probability at least . The audit configurations generated with this algorithm are competitive with alternate methods considered throughout the paper, and in some cases are substantially more efficient.
-  T. Antonyan, S. Davtyan, S. Kentros, A. Kiayias, L. Michel, N. Nicolaou, A. Russell, and A. A. Shvartsman. State-wide elections, optical scan voting systems, and the pursuit of integrity. IEEE Transactions on Information Forensics and Security, 4(4):597–610, 2009.
-  B. Beckert, M. Kirsten, V. Klebanov, and C. Schürmann. Automatic margin computation for risk-limiting audits. In International Joint Conference on Electronic Voting, pages 18–35. Springer, 2016.
-  J. Benaloh, D. Jones, E. Lazarus, M. Lindeman, and P.B. Stark. Soba: Secrecy-preserving observable ballot-level audit. In USENIXAccurate Electronic Voting Technology Workshop, 2011.
-  M. Blom, P. J. Stuckey, V. Teague, and R. Tidhar. Efficient Computation of Exact IRV Margins. In European Conference on AI (ECAI), pages 480–487, 2016.
-  S. Checkoway, A. D. Sarwate, and H. Shacham. Single-ballot risk-limiting audits using convex optimization. In EVT/WOTE, 2010.
-  J.L. Hall, L.W. Miratrix, P.B. Stark, M. Briones, E. Ginnold, F. Oakley, M. Peaden, G. Pellerin, T. Stanionis, and T. Webber. Implementing risk-limiting post-election audits in California. In Proc. 2009 Electronic Voting Technology Workshop/Workshop on Trustworthy Elections (EVT/WOTE ’09), Montreal, Canada, August 2009. USENIX.
-  M. Lindeman and P.B. Stark. A gentle introduction to risk-limiting audits. IEEE Security and Privacy, 10:42–49, 2012.
-  M. Lindeman, P.B. Stark, and V. Yates. BRAVO: Ballot-polling risk-limiting audits to verify outcomes. In Proceedings of the 2011 Electronic Voting Technology Workshop / Workshop on Trustworthy Elections (EVT/WOTE ’11). USENIX, 2012.
-  Mark Lindeman. Evidence-based elections: Beyond the “rigging” debate. Significance, 14(1):18–23, 2017.
-  Mark Lindeman, Neal McBurnett, Kellie Ottoboni, and Philip B Stark. Next steps for the colorado risk-limiting audit (corla) program. arXiv preprint arXiv:1803.00698, 2018.
-  T.R. Magrino, R.L. Rivest, E. Shen, and D.A. Wagner. Computing the margin of victory in IRV elections. In USENIX Accurate Electronic Voting Technology Workshop: Workshop on Trustworthy Elections, USENIX Association Berkeley, CA, USA, 2011.
-  Lawrence D Norden and Ian Vandewalker. Securing Elections from Foreign Interference. Brennan Center for Justice at the New York University School of Law, 2017.
-  Ronald L Rivest and Philip B Stark. When is an election verifiable? IEEE Security & Privacy, 15(3):48–50, 2017.
-  A.D. Sarwate, S. Checkoway, and H. Shacham. Risk-limiting audits and the margin of victory in nonplurality elections. Politics, and Policy, 3(3):29–64, 2013.
-  P.B. Stark. Risk-limiting post-election audits: -values from common probability inequalities. IEEE Transactions on Information Forensics and Security, 4:1005–1014, 2009.
-  P.B. Stark and M. Lindeman. A gentle introduction to risk-limiting audits. IEEE SECURITY and PRIVACY, 10:42–49, 2012.
-  Philip B Stark. Auditing a collection of races simultaneously. arXiv preprint arXiv:0905.1422, 2009.
-  Philip B Stark. Efficient post-election audits of multiple contests: 2009 california tests. 2009.
-  Philip B Stark. Super-simple simultaneous single-ballot risk-limiting audits. In EVT/WOTE, 2010.
-  Philip B Stark and Vanessa Teague. Verifiable european elections: Risklimiting audits for d’hondt and its relatives. USENIX Journal of Election Technology and Systems (JETS), 1(3):18–39, 2014.