Election Control by Manipulating Issue Significance

07/19/2020 ∙ by Andrew Estornell, et al. ∙ University of Oxford Washington University in St Louis 0

Integrity of elections is vital to democratic systems, but it is frequently threatened by malicious actors. The study of algorithmic complexity of the problem of manipulating election outcomes by changing its structural features is known as election control. One means of election control that has been proposed is to select a subset of issues that determine voter preferences over candidates. We study a variation of this model in which voters have judgments about relative importance of issues, and a malicious actor can manipulate these judgments. We show that computing effective manipulations in this model is NP-hard even with two candidates or binary issues. However, we demonstrate that the problem is tractable with a constant number of voters or issues. Additionally, while it remains intractable when voters can vote stochastically, we exhibit an important special case in which stochastic voting enables tractable manipulation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Fair elections are at the core of democratic systems. However, elections are increasingly subject to attack by malicious parties who aim to achieve personal goals at the expense of the social good (Caldwell et al., 2019). The problem of election vulnerability to malicious attack has been studied in the broader literature on election control and bribery (Bartholdi III et al., 1992; Tomz and Houweling, 2008; Faliszewski and Rothe, 2016). However, in much of this literature, control is exercised through a change in the election structure (e.g., adding and removing candidates), or directly the preferences of a subset of voters (bribery). A major means of election control that has often been overlooked in the research literature is manipulation of issues that ultimately determine voter preferences over candidates.

A recent model of election control through issue selection attempts to bridge this gap (Lu et al., 2019). The basis of this model is the spatial theory of voting (Downs, 1957; Enelow and Hinich, 1984), in which voters and candidates are represented as points in issue space, and a distance metric determines relative preferences, with voters preferring candidates who are similar to them on issues. In control through issue selection, a malicious party can select a subset of issues that then determines similarity and, consequently, voter preferences.

While capturing some of the intuition about the kinds of manipulations we commonly see (through, say, the spread of misinformation and fake news), control through issue selection nevertheless misses an essential factor: what is ultimately important, and what is at the core of manipulation is the relative significance, or salience, of issues, with issue selection being a rather extreme special case. A recent examples that illustrates this point is Brexit: Until 2016, the significance of the issue of U.K. membership in the EU was comparatively negligible (Khetani-Shah and Deutsch, 2019). In 2016, it became one of the central issues, with considerable evidence pointing to Russian interference as a factor (Harper et al., 2019; Sabbagh et al., 2019). In general, malicious parties can impact perceptions of relative issue importance in a variety of ways. For example, fake social media accounts can be used to coordinate widespread mentions of particular issues, increasing their salience compared to others. Similarly, influential individuals, such as celebrities or politicians, may be willing to accept payments to be more or less vocal about particular issues. To reflect the relative difficulty or cost of these actions we limit the attacker by one of two constraints.

Our model is a significant generalization of the work of Lu et al. (2019). In our version, preferences of a voter over candidates are generated based on similarity in issue space, weighted by the relative importance of issues

. We study the complexity of this problem in the context of plurality elections for two common models of voter behavior in the spatial framework: 1) deterministic voting, in which voters always vote for their most preferred candidate, and 2) stochastic voting, where the probability of a voter voting for a candidate is a monotonic function of weighted issue similarity. We show that the control problem is in general NP-hard in either case, even with only 2 candidates. Indeed, for the deterministic case we demonstrate hardness even with only

voters, where is the number of issues. Next, we exhibit several tractable special cases. In the deterministic case, if the number of voters is , or the number of issues is constant, election control is in P. In stochastic voting, in turn, control is tractable if the probability of voting for a candidate is linear in their weighted distance from the voter.

Related Work The complexity of controlling elections has seen extensive treatment, starting with the work of Bartholdi III et al. (1992); see Hemaspaandra et al. (2007); Menton (2013); Erdélyi et al. (2015); Chen et al. (2017) for further examples and the survey by Faliszewski and Rothe (2016) for an overview. Variations of this problem consider attacks that add, remove, partition or clone candidates or voters, for a variety of voting rules. However, most of the prior election control literature considers election models in which voter preferences over candidates are given, rather than generated based on distance in issue space. The spatial model of elections, in turn, has received considerable attention in prior literature (Davis and Hinich, 1966; Enelow and Hinich, 1984, 1990; McKelvey and Ordeshook, 1990; Merrill and Groffman, 1999; Anshelevich et al., 2018; Anshelevich and Postl, 2017). However, most of this research has focused on problems other than election control. For example, extensive literature exists on game-theoretic models in which candidates opportunistically select positions in issue space (Downs, 1957; Shen and Wang, 2017; Sabato et al., 2017).

A direct precursor to our model, combining election control with spatial theory of voting, is Lu et al. (2019), who study the model in which an adversary can select an arbitrary subset of issues in an election. These issues are then used to generate voter preferences over candidates, with voters preferring candidates who are closest to them on the adversarially selected issues. We significantly generalize this model by allowing the adversary to change relative importance of issues.

2 Model

Let and

be a set of candidates and voters, respectively. Each candidate and voter is a vector over a set of issues

defined by and . We consider plurality elections in which each voter is asked to report the most preferred candidate, and the winner is the candidate who tallies the most votes.

Suppose that the relative importance of issues to voters is determined by a weight vector , where for , and . In this model, a voter’s preferences over candidates are determined by the weighted distance from each candidate in issue space. Formally, the weighted distance between a voter and candidate is

, and the voter prefers a candidate who is closer according to this weighted distance measure.

Without loss of generality, suppose that is the attacker’s preferred candidate. In our model, an attacker aims to influence the election by modifying the relative importance of issues. Specifically, the attacker changes into a modified preference vector . Restrictions on the attacker’s strength are given in one of the following forms.

  1. Normed Budget Constraint: Given a budget the attacker’s total perturbation must be less than the budget, i.e. .

  2. Interval Constraint: Given a set of intervals , the attacker’s new weight vector must fall within , i.e. for all .

For both of these constraints, we consider two paradigms for voters selecting their desired candidate.

  1. Deterministic voting: Voter votes for candidate , breaking ties according to candidate order .

  2. Stochastic voting: , where is a proper probability mapping and for all .

Both are common models translating voter-candidate distance into voting behavior Enelow and Hinich (1990).

When voters select candidates deterministically, we consider two objectives for the attacker. The first objective (Max Support) is to maximize the total number of votes for their preferred candidate, :

The second objective (Majority Vote) is to win the plurality vote:

For stochastic voting, we consider the objective of maximizing the expected number of votes for :

3 Deterministic Voting

We begin by investigating our model of election control when voters always vote for their most preferred candidate. For compactness of notation we introduce, for every candidate and voter pair, a preference vector which gives ’s unweighted preference for over on each issue. Let . The preference vector for of over is . The condition for voting for is then

In the case of only two candidates, we omit the index and denote the preference vector for over by .

In this section, we will show that both Max Support and Majority Vote are NP-hard, even when there are only two candidates, issues are binary, and the attacker has no constraints on their strength. If these assumptions are further restricted such that there are only voters, where agrees with on exactly issues, then MaxSupport is still NP-hard.

Although both objectives are hard, even with several strong restrictions, we present sufficient conditions for the problem to become tractable. The following positive results hold for the normed budget constraint when , and for the interval constraint with any . Under the normed budget, with , results also hold, however, the computed may break the attackers budget by some small , i.e. for both objectives we obtain, in polynomial time, a vector that is guaranteed to have . The tractable cases are as follows: Max Support has a polynomial time algorithm when either there are at most voters, or the number of issues and the number of values each issues can take on are both constant, and Majority Votes has a polynomial time algorithm when either the number of voters is constant, or the number of issues and the number of values each issues can take on are both constant.

3.1 Hardness of Control in Deterministic Settings

First we show that even for two candidates and binary issues Majority Vote (WTCP) is NP-complete, and that Max Support (TCWMS) is NP-hard. Both of these result are the product of hardness reductions from the problem of election control by issue selection Lu et al. (2019). We first give a formal definition of each problem used in the reductions.

Definition 1.

Let be a set of candidates and be a set of voters. Both candidates and voters are vectors in , indicating positions on issues. An adversary selects a subset with for the objective of either determining if can win the plurality (TCIS), or maximizing the total number of votes received (TCMS).

TCIS was shown to be NP-complete and TCMS to be NP-hard by (Lu et al., 2019).

Theorem 1.

For 2 candidates, voters, and binary issues, the problem of maximizing the number of votes for , TCWMS, is NP-hard, even when in the normed budget constraint, or when in the interval constraint.

Proof.

To prove this claim we will reduce from the problem of 0-1 issue selection on binary issues with two candidates (TCMS). An instance of TCMS is defined by a set of voters and two candidates all of which select positions on binary issues. The objective of TCMS is to maximize the number of votes for subject to the constraint that and . To reduce from a given instance of TCMS we will add a set of voters that forces any optimal solution to have for some constant that can be associated with in the instance of TCMS. Since , we may assume the adversary is selecting the weight vector rather than a perturbation . Without loss of generality we may assume and .

First, let and . To encode , let be a set of voters obtained by mapping each to a voter where for and , . Compactly, each voter can be represented as . Next we will introduce five more sets of voters which will force any optimal to be binary.

For each construct identical copies of a voter who has when , and otherwise. Denote this set of voters as . Let be a set of voters obtained by taking each and flipping their opinion, i.e. for each add to where for all . Note that .

Now let be the set of voters such that each has and for all other .

For each we create voters of the form , and for each , , . Call this set of voter . Lastly, for each create voters with , for , and , . Call this set of voters . Let .

Note that all voters outside of have at least copies of themselves. Therefore no optimal solution will have a voter vote for if doing so meant losing any voter . As a result we will first examine criteria of optimal solutions over .

Note that the preference vector of has if and if .

Consider the voters in , each of which was created according to some . For each , we have that

where . Similarly, for each voter we have

Therefore, all voters can be made to vote for if

The above system of linear equations has a unique solution, namely for all . For any , there are strictly more copies of than there are total voters in all other voter sets combined. Therefore any optimal solution must have all voters in voting for . As a result we will work under the assumption that for all .

All voters in are of the form for some , , and for all . If is made to vote for then

which would immediately imply that

(1)

Since there are more copies of each voter in than there are total remaining voters and since every voter in can be made to vote for , no optimal solution would have any of these voters vote for , and Equation 1 holds.

Finally, consider the voters in and . Each voter in is of the form for some , , for all . Each voter in is of the form for some , , and , for all . Note that for each there are copies of the corresponding voter in and of the corresponding voter in , and that for each either the set of voters in vote for or the voters in vote for . To see this, fix any and consider the voters in either set. If the voter from votes for then it must be the case that

Alternatively, if the voter is in , then

Both of the constraints cannot hold since would imply that and . Again, there are copies of each voter in and there remain only voters left, so it must be the case that any optimal solution gains either the voter in or in for each . Therefore, any optimal solution must have with .

Thus, as the only voters left to sway are those in , which corresponds to , it must be the case that there is a maximum of voters if and only if an optimal solution in the given instance of TCMS attains voters. ∎

When there are only two candidates, the problem of winning the plurality vote becomes a special case of maximizing the number of votes for . The proof of Theorem 1 can be easily extended to the problem of winning the plurality, by adding a set of voters who agree with on all issues, such that this set “cancels out” any votes for from the constructed voters. These new voters can clearly not be won over by any nonzero weight vector. This yields the following theorem.

Theorem 2.

For 2 or more candidates, voters, and binary issues, the problem of determining if can win the plurality, TCWP, is NP-complete, even when in the normed budget constraint or when in the interval constraint..

Next, we proceed to considerably strengthen the hardness result in Theorem 1. When there are only two candidates and issues are binary, a partial order can be induced on voters by the number of issues they agree with on. That is, the set of voters can be partitioned into tiers , where is the set of voters who agree with on exactly issues. We now show that even if for all there are only a constant number of voters who agree with on exactly issues, maximizing the number of votes for is still NP-hard.

Theorem 3.

Suppose there are only two candidates, binary issues, and voters. Suppose further that for each . Then maximizing the number of voters for is NP-hard, even when in the normed budget constraint, or when in the interval constraint.

Proof.

To prove this claim, we will reduce from TCWMS, which was shown to be NP-hard in Theorem 1. An instance of TCWMS is defined by two candidates , a voter set , and a set of issues taking on values .

In the constructed instance of our problem let and assume w.l.o.g. that , . We will construct a set of voters that encodes the voters in such that the election only depends on issues . To do this, first decompose the set of voters into disjoint sets , such that . Starting at , iterate through each and create a voter such that for all , for all and otherwise. Under this construction, for any index , there is only a single for which . That is, each voter has either or , for , as terms in . Therefore, each of these index sets can be associated with a single index, say , where for all , and . Under this simplified version of indices, we see that for , and for that the contribution from all issues, to the voters preference sum is, . If we take any three of these sums as linear inequalities for voters , we get

Since each , the only satisfying assignment to these three inequalities is for all . Therefore the objectives of both problems align and this restricted version of WTCMS is NP-hard. ∎

3.2 Tractable Special Cases

We now return to the setting when there are candidates, voters, and issues are real-valued. Although both Max Support and Majority Vote are NP-hard even with several strong restrictions, we now show sufficient conditions for either objective to be computed efficiently, as well as algorithms to do so.

Under the normed budget constraint when , or under the interval constraint when , if the number of voters is , or the number of issues is constant and issue values are from a set of constant size, then Max Support can be computed in polynomial time. For the normed budget constraint with , if the maximum number of votes for is when , then for a perturbation where and obtains votes, can be found in polynomial time with respect to the input size and . Moreover, as , asymptotically approaches .

Under similar assumptions on the number of voters or issues, the objective of Majority Vote can be computed in polynomial time for the normed budget constraint with , or for the interval constraint with . In the case of , suppose that , assuming can be made to win the election. For a perturbation with winning the election and , can be found in polynomial time with respect to the problem size and . This might break the attacker’s budget by at most . If then, simply taking wins the election within the budget constraint. However, in the case when , it will be unknown whether there exists a with such that wins the election. If the attacker is allowed to break their budget by , i.e. then taking wins the election.Further, as , asymptotically approaches .

The existence of polynomial time algorithms for these two objectives is particularly interesting, given that the problem was NP-hard in the case of control by issue selection even for a single voter (Lu et al., 2019).

In both cases we use Algorithm 1, where unanimity-program, refers to an optimization program in which all voters in the given demographic, , are made to unanimously vote for a given candidate.

Result: weight vector achieving the most voters
for  do
       Solve unanimity-program over ;
       if unanimity-program feasible and is within budget restriction then
             Store and ;
            
       end if
      
end for
return argmax
Algorithm 1 Maximizing votes for

Recall that for a given candidate–voter pair , the vector gives ’s per-issue preference for over .

Under the normed budget constraint, the unanimity program for a demographic, , is given by

(2)
s.t.

and under the interval constraint, the unanimity program is given by the following linear feasibility problem:

(3)
Theorem 4.

Suppose there are candidates, real-valued issues, and voters and the attacker is restricted by , where . Then Algorithm 1 computes Max Support for , in polynomial time.

Proof.

Since , . Each subset of voters is referred to as a demographic. For any , determining if all voters in can be made to unanimously vote for can be computed by solving Program 2. We minimize over in order to determine if the minimum change to , such that all of votes for , is larger than . That is, demographics that cannot be made to vote for come in two forms: those where the constraint set is infeasible, and those where the value of the optimal solution is greater than the budget . By selecting the largest viable demographic, the maximum votes for can be found. For Program 2 can be solved in polynomial time. When the program reduces to a liner program, and when the program reduces to a positive definite quadratic program, all of which have polynomial time algorithms. ∎

Theorem 5.

Suppose there are candidates, real-valued issues, and voters, and the attacker is restricted by , where . Let . Then for any a perturbation, , can be computed in polynomial time with respect to the problem size and , that obtains at least as many votes as and .

Proof.

Similarly to Theorem 4, Program 2 can be solved for each demographic. In contrast to Theorem 4, when we are solving a general convex program, and thus polynomial time solutions will be off by at most a factor of . For a given demographic, , suppose the optimal solution to Program 2 is . Then for we can obtain a solution such that . As before, demographics that cannot be made to unanimously vote for come in two forms: demographics in which the constraints of the Program 2 are infeasible, and demographics for which the optimal has . We need not consider demographics of the first type, since the approximation of the convex program will not return a vector if the constraint set is infeasible. Via the same strategy as Theorem 4, we solve each program and take the largest demographic that can be made to unanimously vote for . The key difference in this case, is that we may be selecting a demographic that has more voters than than the optimal solution, and requires budget to obtain. Thus, if is the smallest vector, with , that obtains the maximum votes for , then . ∎

Theorem 6.

Suppose there are candidates, real-valued issues, and voters and the attacker has the interval constraint for some interval . Then Algorithm 1 computes Max Support for , in polynomial time.

Proof.

Each unanimity program is now given by Program 3, which is simply a feasibility LP. Therefore determining if a particular demographic can be made to vote for can be done in polynomial time. As stated in 4, there are demographics that need to be checked and thus the maximum number of votes for can be computed in polynomial time. ∎

Theorem 7.

Suppose there are candidates, voters, and issues, each of which take on values from a set of constant size. Then Algorithm 1 computes Max Support for in polynomial time, for the normed budget restriction when .

Proof.

Since and positions are selected from a set of constant size, say , then only distinct voters can exist. So, there may be voters, but at most of them that need to be investigated. Let be the set of all unique voters in . For each we also keep track of the number of times appears in . So, and there are only a constant number of programs to solve, the only difference being that we now choose the feasible demographic representing the maximum number of voters in , rather than . As stated before, each program can be efficiently solved when . ∎

Theorem 8.

Suppose there are candidates, voters, issues, each of which take on values from a set of constant size, and the attacker is restricted by , where . Let . Then for any a perturbation, , can be computed in polynomial time with respect to the problem size and , that obtains at least as many votes as and .

Proof.

We again use the idea in the proof of Theorem 7 by keeping tack of the unique voters. Once we have the set of unique voters, the proof is identical to that of Theorem 4. ∎

Theorem 9.

Suppose there are candidates, voters, and issues, each of which take on values from a set of constant size. Then Algorithm 1 computes Max Support for in polynomial time, under the interval constraint for given intervals .

Proof.

After constructing the set of unique voters, we solve a constant number of linear programs and take the vector yielding the largest number of votes for

. ∎

Theorem 10.

Suppose that there are candidates and either , or where each issue takes on values from a set of constant size. Then under the budgeted constraint for , Majority Vote can be computed in polynomial time.

Proof.

In this setting the number of unique partitions of is constant. Thus, if there are candidates, there are unique ways in which each partition can be assigned to a candidate. This assignment of disjoint demographics to candidates is equivalent to that particular demographic being made to vote for that candidate. Each pairing, for a given partition , can be given by a set . To check if there exists a weight vector such that the given pairing is attainable, one need only solve Program 2, with the additional set of linear constraints that for all , , and for all , where is the preference vector of for over . There are only programs to solve, each of which takes polynomial time. ∎

Theorem 11.

Suppose that there are candidates and either , or where each issue takes on values from a set of constant size. Suppose further that the attacker is restricted by , where . Let . Then for any a perturbation, , can be computed in polynomial time with respect to the problem size and , that wins the election and .

Proof.

As in the proof of Theorem 10, the feasibility of each assignment of voters to candidates can be formulated a convex program. An assignment of voters to candidates is valid if the program is feasible and if the optimal solution has value at most . For these programs cannot be solved exactly in polynomial time, but for any where is polynomial with respect to the problem size, a solution , with , can be computed efficiently. Thus, we obtain solutions for for each program and take the one with the smallest one under the norm such that wins the election. ∎

Theorem 12.

Suppose that there are candidates and either , or where each issue takes on values from a set of constant size. Then under the interval constraint, for , Majority Vote can be computed in polynomial time.

Proof.

Under the interval constraint each possible assignment of voters to candidates can be formulated as a linear program. As shown in the proof of Theorem 10 there are only a constant number of such assignments and thus we need only solve a constant number of linear programs and then choose the assignment of voters to candidates such that wins the election. ∎

4 Stochastic Voting

Another common model for candidate selection is that of stochastic voting, where votes are cast via a distribution over candidates Schofield et al. (1998). More precisely, let be a function that maps weighted distance between a voter and a candidate to a probability of the voter voting for this candidate. Next, we show that election control in this setting for general is NP-hard even when we only have 2 candidates. For this, suppose that belongs to a general class of sigmoidal functions Udell and Boyd (2013), of which the logistic function is a well-known member.

Definition 2.

A function is said to be sigmoidal if it is Lipshitz continuous and one of the following is true: is convex, is concave, or there exists such that is concave on and convex on .

We now show the hardness of maximizing the expected number of votes for even in the two-candidate case.

Theorem 13.

Suppose there are or more candidates, voters and

issues, where votes are cast via a sigmoidal function. Then maximizing the expected number of votes for

is NP-hard, even when , or .

Proof.

We reduce from the known NP-hard problem Max-2SAT. An instance of Max-2SAT can be defined by a set of Boolean variables and a set of clauses where each . Let the number of issues be , let and let . Define . Create voters of the form for all and . For each Boolean variable, create voters of the form if , , and . Additionally, for each , create voters of the form for all and . Finally, we encode each clause as a voter. Clauses can take on one of the three forms and we map each form to a voter in the following way:

  1. yields and for all .

  2. yields and for all .

  3. yields , , and for all