DiRe Committee : Diversity and Representation Constraints in Multiwinner Elections

07/15/2021 ∙ by Kunal Relia, et al. ∙ NYU college 26

The study of fairness in multiwinner elections focuses on settings where candidates have attributes. However, voters may also be divided into predefined populations under one or more attributes (e.g., "California" and "Illinois" populations under the "state" attribute), which may be same or different from candidate attributes. The models that focus on candidate attributes alone may systematically under-represent smaller voter populations. Hence, we develop a model, DiRe Committee Winner Determination (DRCWD), which delineates candidate and voter attributes to select a committee by specifying diversity and representation constraints and a voting rule. We show the generalizability of our model, and analyze its computational complexity, inapproximability, and parameterized complexity. We develop a heuristic-based algorithm, which finds the winning DiRe committee in under two minutes on 63 synthetic datasets and on 100 an empirical analysis of the running time, feasibility, and utility traded-off. Overall, DRCWD motivates that a study of multiwinner elections should consider both its actors, namely candidates and voters, as candidate-specific "fair" models can unknowingly harm voter populations, and vice versa. Additionally, even when the attributes of candidates and voters coincide, it is important to treat them separately as having a female candidate on the committee, for example, is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The problem of selecting a committee from a given set of candidates arises in multiple domains; it ranges from political sciences (e.g., selecting the parliament of a country) to recommendation systems (e.g., selecting the movies to show on Netflix). Formally, given a set of candidates (politicians and movies, respectively), a set of voters (citizens and Netflix subscribers, respectively) give their ordered preferences over all candidates to select a committee of size . These preferences can be stated directly in case of parliamentary elections, or they can be derived based on input, such as when Netflix subscribers’ viewing behavior is used to derive their preferences. In this paper, we focus on selecting a -sized (fixed size) committee using direct, ordered, and complete preferences.

Which committee is selected depends on the committee selection rule, also called multiwinner voting rule. Examples of commonly used families of rules when a complete ballot of each voter is given are Condorcet principle-based rules [faliszewski2016committee], which select a committee that is at least as strong as every other committee in a pairwise majority comparison, approval-based voting rules [faliszewski2016committee, kilgour2010approval, sanchez2017proportional] where each voter submits an approval ballot approving a subset of candidates, and ordinal preference ballot-based voting rules like k-Borda and -Chamberlin-Courant (-CC) [elkind2017properties, faliszewski2017multiwinner] that are analogous to single-winner rules. We note that this version of CC rule is different from the Chamberlin–Courant approval voting rule used in the context of approval elections [aziz2017justified, lackner2021consistent]. In this paper, we focus on ordinal preference-based rules that are analogous to single-winner rules. We refer readers to Section 2.2 of [faliszewski2017multiwinner] for further details on the commonly used families of multiwinner voting rules.

Recent work on fairness in multiwinner elections show that these rules can create or propagate biases by systematically harming candidates coming from historically disadvantaged groups [bredereck2018multiwinner, celis2017multiwinner]. Hence, diversity constraints on candidate attributes were introduced to overcome this problem. However, voters may be divided into predefined populations under one or more attributes, which may be different from candidate attributes. For example, voters in Figure 0(b) are divided into “California” and “Illinois” populations under the “state” attribute. The models that focus on candidate attributes alone may systematically under-represent smaller voter populations.

(a) candidates
(b) voters
Figure 1: (a) Candidates with “gender” attribute (and their Borda scores) and (b) voters with “state” attribute. The winning committee (size =2) for California and Illinois, states in the United States, is { and {, respectively.
Example 1.

Consider an election consisting of = 4 candidates (Figure 0(a)) and = 4 voters giving ordered preference over candidates (Figure 0(b)) to select a committee of size = 2. Candidates and voters have one attribute each, namely gender and state, respectively. The -Borda111The Borda rule associates score with the position, and -Borda selects candidates with the highest Borda score. winning committee computed for each voter population is , for California and , for Illinois.

Suppose that we impose a diversity constraint that requires the committee to have at least one candidate of each gender, and a representation constraint that requires the committee to have at least one candidate from the winning committee of each state. Observe that the highest-scoring committee, which is also representative, consists of , (score = 17), but this committee is not diverse, since both candidates are male. Further, the highest-scoring diverse committee consisting of , (score = 13) is not representative because it does not include any winning candidates from Illinois, the smaller state. The highest-scoring diverse and representative committee is , (score = 12).

This example illustrates the inevitable utility cost due to enforcing of additional constraints.

Note that, in contrast to prior work in computational social choice, we incorporate voter attributes that are separate from candidate attributes. Also, our work is different from the notion of “proportional representation” [sanchez2017proportional, brill2018multiwinner, monroe1995fully], where the number of candidates selected in the committee from each group is proportional to the number of voters preferring that group, and from its variants such as “fair” representation [koriyama2013optimal]. All these approaches dynamically divide the voters based on the cohesiveness of their preferences. Another related work, multi-attribute proportional representation [lang2018multi], couples candidate and voter attributes. An important observation we make here is that, even if the attributes of the candidates and of the voters coincide, it may still be important to treat them separately in committee selection. This is because having a female candidate on the committee, for example, is different from having a candidate on the committee who is preferred by the female voters, and who themselves may or may not be female.

Contributions. In this paper, we define a model that treats candidate and voter attributes separately during committee selection, and thus enables selection of the highest-scoring diverse and representative committee. We show the generality of our model using appropriate parameters. We then show NP-hardness of committee selection under our model for various settings, give results on inapproximability and parameterized complexity, and present a heuristic-based algorithm. Finally, we present an experimental evaluation using real and synthetic datasets, in which we show the efficiency of our algorithm, analyze the feasibility of committee selection and illustrate the utility trade-offs.

2 Related Work

Our work is at an intersection of multiple ideas, and hence, in this section, we briefly summarize the related work spread across different domains, some of which we already discussed in previous sections.

Fairness in Ranking and Set Selection.

There is a growing understanding in the field of theoretical computer science about the possible presence of algorithmic bias in multiple domains [baeza2016data, bellamy2018ai, danks2017algorithmic, hajian2016algorithmic, lambrecht2019algorithmic], especially in variants of set selection problem [stoyanovich2018online]. The study of fairness in ranking and set selection, closely related to the study of multiwinner elections, use constraints in algorithms to mitigate bias caused against historically disadvantaged groups. Stoyanovich et al. [stoyanovich2018online] use constraints in streaming set selection problem, and Yang and Stoyanovich [yang2017measuring] and Yang et al. [yang2019balanced] use constraints for ranked outputs. Kuhlman and Rundensteiner [Kuhlman2020rank] focus on fair rank aggregation and Bei et al. [bei2020candidate] use proportional fairness constraints. Our work adds to the research on the use of constraints to mitigate algorithmic bias.

Fairness in Participatory Budgeting.

Multiwinner elections are a special case of participatory budgeting, and fairness in the latter domain has also received particular attention. For example, projects (equivalent to candidates) are divided into groups and for fairness they consider lower and upper bounds on utility achieved and the lower and upper bounds on cost of projects used in every group [patel2020group]. Fluschnik et al. [fluschnik2019fair] aim to achieve fairness among projects using their objective function. Next, Hershkowitz et al. [hershkowitz2021district] have studied fairness from the utility received by the districts (equivalent to voters), Peters et al. [peters2020proportional] define axioms for proportional representation of voters, and Lackner et al. [lackner2021fairness] define fairness in long-term participatory budgeting from voters’ perspective. However, we note that none of these work simultaneously consider fairness from the perspective of both, the projects and the districts.

Unconstrained Multiwinner Elections and Proportional Representation.

The study of complexity of UCWD has received attention [faliszewski2017multiwinner]. Selecting a committee using Chamberlin-Courant (CC) [chamberlin1983representative] rule is NP-hard [procaccia2008complexity], and approximation algorithms have resulted in the best known ratio of [skowron2015achieving, lu2011budgeted]. Yang and Wang [yang2018parameterized] studied its parameterized complexity. Another commonly studied rule, Monroe [monroe1995fully], is also NP-hard [betzler2013computation, elkind2017properties]. Sonar et al. [sonar2020complexity] showed that even checking whether a given committee is optimal when using these two rules is hard. Finally, the hardness of problems involving restricted voter preferences and committee selection rule have been studied [elkind2015structure, peters2017preferences] and so has the proportional representation in dynamic ranking [israel2021dynamic].

Constrained Multiwinner Elections.

The study of complexity of using diversity constraints in elections and its complexity has also received particular attention. Goalbase score functions, which specify an arbitrary set of logic constraints and let the score capture the number of constraints satisfied, could be used to ensure diversity [uckelman2009representing]. Using diversity constraints over multiple attributes in single-winner elections is NP-hard [lang2018multi]. Also, using diversity constraints over multiple attributes in multiwinner elections is NP-hard, which has led to approximation algorithms and matching hardness of approximation results by Bredereck et al. [bredereck2018multiwinner] and Celis et al. [celis2017multiwinner]. Finally, due to the hardness of using diversity constraints over multiple attributes in approval-based multiwinner elections [brams1990constrained]

, these have been formalized as integer linear programs (ILP)

[potthoff1990use]. In contrast, Skowron et al. [skowron2015achieving] showed that ILP-based algorithms fail in real world when using ranked voting-related proportional representation rules like Chamberlin-Courant and Monroe rules, even when there are no constraints.

Overall, the work by Bredereck et al. [bredereck2018multiwinner], Celis et al. [celis2017multiwinner], and Lang and Skowron [lang2018multi] is closest to ours. However, we differ as we: (i) consider elections with predefined voter populations, (ii) delineate voter and candidate attributes, and (iii) consider representation and diversity constraints. No previous work, to the best of our knowledge, has considered fairness from the perspective of voter attributes or has delineated candidate and voter attributes even when they coincide.

3 Preliminaries and Notation

Multiwinner Elections.

Let be an election consisting of a candidate set and a voter set , where each voter has a preference list over candidates, ranking all of the candidates from the most to the least desired. denotes the position of candidate in the ranking of voter , where the most preferred candidate has position 1 and the least preferred has position .

Given an election and a positive integer (for , we write ), a multiwinner election selects a -sized subset of candidates (or a committee) using a multiwinner voting rule (discussed later) such that the score of the committee is the highest. Formally, given and , outputs the required committee of exactly candidates with the highest score. We assume ties are broken using a pre-decided priority order over all candidates.

Candidate Groups.

The candidates have attributes, , such that and . Each attribute , for all , partitions the candidates into groups, . Formally, , . For example, candidates in Figure 0(a) have one attribute gender ( = 1) with two disjoint groups, male and female ( = 2). Overall, the set of all such arbitrary and potentially non-disjoint groups will be .

The number of groups a candidate belongs to is equal to the number of attributes .

Voter Populations.

The voters have attributes, , such that and . The voter attributes may be different from the candidate attributes. Each attribute , for all , partitions the voters into populations, . Formally, , . For example, voters in Figure 0(b) have one attribute state ( = 1), which has two populations California and Illinois ( = 2). Overall, the set of all such predefined and potentially non-disjoint populations will be .

The number of populations a voter belongs to is equal to the number of attributes . Additionally, we are given , the winning committee of each population . We note that a fine-grained accounting of representation is not possible in our model. This is because when a committee selection rule such as Chamberlin-Courant rule is used to determine each population’s winning committee , then a complete-ranking of each population’s collective preferences is not possible. Thus, we have design our model to only consider each population’s winning committee .

Multiwinner Voting Rules.

There are multiple types of multiwinner voting rules, also called committee selection rules. In this paper, we focus on committee selection rules that are based on single-winner positional voting rules, and are monotone and submodular ( and ) [bredereck2018multiwinner, celis2017multiwinner].

Definition 1.

Chamberlin–Courant (CC) rule: The CC rule [chamberlin1983representative] associates each voter with a candidate in the committee who is their most preferred candidate in that committee. The score of a committee is the sum of scores given by voters to their associated candidate. Specifically, -CC uses Borda positional voting rule such that it assigns a score of to the ranked candidate who is their highest ranked candidate in the committee.

Definition 2.

Monroe rule: The Monroe rule [monroe1995fully] dynamically divides the voters into populations based on the cohesiveness of their preferences where = (assuming divides ). Then, each subpopulation’s most preferred candidate is selected into the -sized committee. Formally, for each population, say , select the candidate that has the highest score for that subpopulation: . In other words, each candidate in the committee is represented by an equal number of voters.

A special case of submodular functions are separable functions, which calculate the score of committee as follows: score of a committee is the sum of the scores of individual candidates in the committee. Formally, is separable if it is submodular and [bredereck2018multiwinner]. Monotone and separable selection rules are natural and are considered good when the goal of an election is to shortlist a set of individually excellent candidates [faliszewski2017multiwinner]:

Definition 3.

-Borda rule The -Borda rule outputs committees of candidates with the highest Borda scores.

4 DiRe Committee Model

In this section, we formally define a model to select a diverse and representative committee, namely committee, and show its generality.

Definition 4.

Unconstrained Committee Winner Determination (UCWD): We are given a set of candidates, a set of voters such that each voter has a preference list over candidates, a committee selection rule , and a committee size . Let denote the family of all size- committees. The goal of UCWD is to select a committee that maximizes .

We now discuss the diversity and representation constraints. The lowest possible value that these constraints can take is 1, which replicates real-world scenarios. For instance, the United Nations charter guarantees at least one representative to each member country in the United Nations General Assembly, independent of the country’s population. Similarly, each state of the United States of America is guaranteed at least three representatives in the US House of Representatives. Hence, from fairness perspective, each candidate group and voter population deserves at least one candidate in the committee. Theoretically, all results in this paper hold even if the lowest possible value that the constraints can take is 0.

Diversity Constraints,

denoted by for each candidate group , enforces at least candidates from the group to be in the committee . Formally, for all , . We note that we do not propose to use the upper bounds as it induces quota system, which is not desirable from social choice perspective.

Representation Constraints,

denoted by for each voter population , enforces at least candidates from the population ’s committee to be in the committee . Formally, for all , . We again do not propose to use the upper bounds as it induces the undesirable quota system.

Definition 5.

()-DiRe Committee Feasibility ((, )-DRCF): We are given an instance of election , a committee size , a set of candidate groups over attributes and their diversity constraints for all , and a set of voter populations over attributes and their representation constraints and the winning committees for all . Let denote the family of all size- committees. The goal of (, )-DRCF is to select committees that satisfy the diversity and representation constraints such that for all and for all . All such committees that satisfy the constraints are called DiRe committees.

If a committee selection rule is also an input to the feasibility problem, we get the (, , )-DRCWD problem:

Definition 6.

(, )-DiRe Committee Winner Determination ((, , )-DRCWD): Given an instance of (, )-DRCF and a committee selection rule , let denote the family of all size- committees, then the goal of (, , )-DRCWD is to select a committee that maximizes the among all committees.

We note that we denote the possible values that and can take using parenthesis. For example, ‘(, 0, )-DRCWD’ implies that we are specifying a setting . We use the same notation for ‘’ such that ‘(, 0, )-DRCWD’ implies that we are specifying a setting . We use the same notation for .

Observation 1.

(, , )-DRCWD is a generalized version of (, )-DRCF and UCWD. Hence, if (, , )-DRCWD is polynomial time computable, then so are the corresponding UCWD and (, )-DRCF problems. If either UCWD is NP-hard or (, )-DRCF is NP-hard, then (, , )-DRCWD is NP-hard.

4.1 Generality of (, , )-Drcwd

Our model is general in that it provides the flexibility to specify the diversity and representation constraints and to select the voting rule. Thus, in this section we define the diverse committee problem [bredereck2018multiwinner, celis2017multiwinner] and the apportionment problem [brill2018multiwinner, hodge2018mathematics] as special cases of (, , )-DRCWD.

(, 0, )-DRCWD and Diverse Committee Problem.

We define the diverse committee problem in our model [bredereck2018multiwinner, celis2017multiwinner]: In the diverse committee problem, we are given an instance of UCWD that consists of a set of candidate groups and the corresponding diversity constraints, lower bound and upper bound , for all . Let denote the family of all size- committees. The goal of the diverse committee problem is to select a committee that maximizes the among the committees that satisfy the constraints.

It is clear that (, 0, )-DRCWD, i.e., without the presence of any voter attributes, is equivalent to the diverse committee problem. As we do not use upper bounds, our model is generalizable when the upper bound in the diverse committee model is equal to the size of group for all and the minimum value that the lower bound can take is 1 for all . This is in line with the approach used in Theorem 6 of Celis et al. [celis2017multiwinner]. Formally, and for all .

(0, 1, )-DRCWD and Apportionment Problem.

We define the apportionment problem in our model [brill2018multiwinner]: In the apportionment problem, we are given an instance of UCWD that consists of a set of disjoint voter populations over one attribute and winning committees for all . Let denote the family of all size- committees. The goal of the apportionment problem is to select a committee that maximizes the among all the committees that satisfy the lower quota, i.e., , .

It is easy to see that (0, 1, )-DRCWD, which consists of zero candidate attributes and one voter attribute, is same as the apportionment problem if we set the representation constraint of each population to be equal to the lower quota of the apportionment problem. Formally, , = . This generality holds on a realistic assumption that .

Finally, we note that our model can be adopted to accept approval votes as an input and thus if each population is completely cohesive within itself, then the representation constraints can be set to formulate known representation methods like proportional justified representation [sanchez2017proportional] and extended justified representation [aziz2017justified] as (, , )-DRCWD.

5 Complexity Results

Instance of (, , )-DRCWD Computational Complexity
(, , separable)-DRCWD P (Lem. 1)
(, 0, separable)-DRCWD NP-hard (Thm. 3, Thm. 4)
(, , separable)-DRCWD NP-hard (Thm. 5, Cor. 2)
(, , submodular)-DRCWD NP-hard (Thm. 6, Cor. 3)
Table 1: A summary of complexity of (, , )-DRCWD (Theorem 1, Corollary 1). The value in brackets for and denote that the results hold for all non-negative integers and all non-negative integers that satisfy the condition stated in the brackets. The results are under the assumption P NP. ‘Lem.’ denotes Lemma. ‘Thm.’ denotes Theorem. ‘Cor.’ denotes Corollary.

In this section, we give a classification of the computational complexity222The hardness, inapproximability and parameterized complexity results throughout the paper are under the assumption P NP. of the (, , )-DRCWD problem under different settings.

Finding a committee using a submodular scoring function like the utilitarian version of Chamberlin-Courant rule is known to be NP-hard [procaccia2008complexity] and selecting a diverse committee when a candidate belongs to three groups is also known to be NP-hard [bredereck2018multiwinner, celis2017multiwinner]. However, the proofs of these hardness results are fragmented over several papers and the proofs use reductions from several well-known NP-hard problems. For instance, the proof of hardness for the of Chamberlin-Courant uses a reduction from exact 3-cover [procaccia2008complexity] and the proof of hardness for computing a diverse committee uses a reduction from 3-dimensional matching [bredereck2018multiwinner] and 3-hypergraph matching [celis2017multiwinner]. Moreover, we are the first ones to introduce representation constraints and hardness due to its use is unknown. Hence, in this section, we provide a complete classification of the (, , )-DRCWD problem by giving a reduction from a single well-known NP-hard problem, namely, the vertex cover problem, inspired by the similar approach used in [chakraborty2021classifying].

Finally, we note that as the following classification holds for every integer (specifically, every whole number as can not be negative) and every integer , our reductions are designed for the same range of values.

Theorem 1.

Let : and be a committee selection rule, then (, , )-DRCWD is NP-hard.

Corollary 1.

Classification of Complexity of (, , )-DRCWD.

  1. If , , and the committee selection rule is a monotone, submodular function, then (, , )-DRCWD is NP-hard.

  2. If , , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is in P.

  3. If , , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is NP-hard.

  4. If , , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is NP-hard.

5.1 Tractable Case

Theorem 2.

[Theorem 21, Corollary 22 in Celis et al. [celis2017multiwinner]] The diverse committee feasibility problem can be solved in polynomial time when = 2.

Without loss of generality (W.l.o.g.), the above theorem holds when it is assumed that . Hence, it holds for all : 2. Therefore, based on the relationship between (, 0, )-DRCWD and Diverse Committee Problem (Section 4.1), we prove the following lemma, which in turn proves the statement in Corollary 1(2):

Lemma 1.

If , , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is in P.

Proof.

When =0, there are no voter attributes or representation constraints, and hence, the (, , )-DRCWD problem is equivalent to the diverse committee problem. Moreover, when is a monotone, separable function, then the complexity of the (, , )-DRCWD is equivalent to the complexity of (, )-DRCF. Thus, the polynomial time result of diverse committee feasibility problem when the number of groups a candidate belongs to is equal to two, which in our model implies that the number of candidate attributes is equal to two (), holds for our setting (Theorem 9 [bredereck2018multiwinner], Corollary 22 (full-version) [celis2017multiwinner]).

More specifically, when , we use the algorithm given in the proof of Theorem 21 by Celis et al. [celis2017multiwinner] and set the upper bound equal to the group size. Formally, for all .

Next, when , a straight-forward algorithm that selects the top scoring candidates for all results into a DiRe committee, which satisfies the diversity constraints . ∎

5.2 Hardness Results

NP-hard problem used.

As discussed earlier, the NP-hardness of (, , )-DRCWD when using representation constraints is unknown. Moreover, the known hardness results for using submodular but not separable scoring function and diverse committee selection problems were established via reductions from different NP-hard problems. We will establish the NP-hardness of (, , )-DRCWD for various settings of , , and via reductions from a single well known NP-hard problem, namely, the vertex cover problem on 3-regular333A 3-regular graph stipulates that each vertex is connected to exactly three other vertices, each one with an edge, i.e., each vertex has a degree of 3. The VC problem on 3-regular graphs is NP-hard. We use 3-regular graphs to exploit its structure to prove the hardness of (, , )-DRCWD w.r.t. diversity constraints (Theorem 3 and Theorem 4). We note that the reductions used in the proofs of Theorem 5 and Theorem 6 do not need 3-regular graphs and hold for VC problem on arbitrary graphs as well., 2-uniform444The size of hyperedges has implications in the hardness of approximation and parameterized complexity results and hence, we mention it over here. For the complexity results, we use 2-uniform hypergraphs only. hypergraphs [garey1979computers, alimonti1997hardness].

Definition 7.

Vertex Cover (VC) problem: Given a graph consisting of a set of vertices = and a set of edges = where each edge connects two vertices in such that an edge corresponds to a 2-element subset of , then a vertex cover of is a subset of vertices such that each edge contains at least one vertex from (i.e. for each edge ). The vertex cover problem is to find the minimum vertex cover of .

5.2.1 (, , )-DRCWD w.r.t. diversity constraints

When , (, , )-DRCWD is equivalent to the diverse committee selection problem. However, the hardness of (, , )-DRCWD when does not follow the hardness of the diverse committee selection problem when the number of groups that a candidate can belong to is greater than or equal 3 [bredereck2018multiwinner, celis2017multiwinner] as the reductions in these papers are specifically for the case when .

More specifically, Theorem 9 of Bredereck et al. [bredereck2018multiwinner] uses a reduction from 3-Dimensional Matching that only holds for instances when the number of groups that a candidate can belong to is exactly 3. Also, they set lower bound and upper bound to 1, which is mathematically different from our setting where we only allow lower bounds. On the other hand, Theorem 6 (“NP-hardness of feasibility: 3”555In Celis et al. [celis2017multiwinner], denotes “the maximum number of groups in which any candidate can be”.) of Celis et al. [celis2017multiwinner] uses two reductions: the first reduction from -hypergraph matching is indeed for the case when the number of groups that a candidate can belong to is greater than or equal to 3 but is limited to instances when lower bound is set to 0 and upper bound to 1, which is a trivial case in our setting as we only use lower bounds and do not allow for upper bounds. Moreover, in-principle, the reduction from -hypergraph matching uses a different problem for each as when , the -hypergraph matching and -hypergraph matching are separate problems. The second reduction from 3-regular vertex cover is for instances when the number of groups that a candidate can belong to is exactly 3.

Hence, in this section, we give a reduction from a single known NP-hard problem, namely the vertex cover problem, such that our result holds even when , . Also, the reductions are designed such that real-world stipulation where each candidate attribute , partitions all the candidates into two or more groups holds. The next two theorems help us prove the statement in Corollary 1(3).

Theorem 3.

If and

is an odd number,

, and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is NP-hard.

Proof.

For our reduction, we use -Borda committee selection rule as an example of monotone, separable function.

We reduce an instance of vertex cover (VC) problem to an instance of (, , )-DRCWD. We have one candidate for each vertex , and dummy candidates where corresponds to the number of vertices in the graph and is the number of candidate attributes. Formally, we set = {} and the dummy candidate set = {}. Hence, the candidate set = is of size candidates. We set the target committee size to be .

Next, we have candidate attributes. Each edge that connects vertices and correspond to a candidate group that contains two candidates and . As our reduction proceeds from a 3-regular graph, each vertex is connected to three edges. This corresponds to each candidate having three attributes and thus, belonging to three groups. Next, for each of the candidates , we have blocks of dummy candidates and each block contains dummy candidates . Thus, we have a total of dummy candidates, which equals to dummy candidates. Next, each block of candidates contains 3 sets of candidates: Set contains one candidate and Sets and contain candidates each. Specifically, each of the blocks for each candidate is constructed as follows:

  • Set consists of single dummy candidate, .

  • Set consists of dummy candidates, for all .

  • Set consists of dummy candidates, for all .

Each candidate in the block has attributes and are grouped as follows:

  • The dummy candidate is in the same group as candidate . It is also in groups, individually with each of dummy candidates, . Thus, the dummy candidate has attributes and is part of groups.

  • For each dummy candidate , it is in the same group as as described in the previous point. It is also in groups, individually with each of dummy candidates, . Thus, each dummy candidate has attributes and is part of groups.

  • For each dummy candidate , it is in groups, individually with each of dummy candidates, , as described in the previous point. Next, note that as is an odd number, is an even number, which means Set has an even number of candidates. Hence, we randomly divide candidates into two partitions. Then, we create groups over one attribute where each group contains two candidates from Set such that one candidate is selected from each of the two partitions without replacement. Thus, each pair of groups is mutually disjoint. Thus, each dummy candidate is part of exactly one group that is shared with exactly one another dummy candidate where . Overall, this construction results in one attribute and one group for each dummy candidate . Hence, each dummy candidate has attributes and is part of groups.

As a result of the above described grouping of candidates, each candidate also has attributes and is part of groups. Note that each candidate already had three attributes and was part of three groups due to our reduction from vertex cover problem on 3-regular graphs. Additionally, we added blocks of dummy candidates and grouped candidate with candidate from each of the blocks. Hence, each candidate has attributes and is part of groups. We set for all , which corresponds that each vertex in the vertex cover should be covered by some chosen edge.

Finally, we introduce voters. For simplicity, let denote the candidate in set . The first voter ranks the candidates based on their indices.

The second voter improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

Next, the third voter further improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

Similarly, all the voters rank the candidates based on this method. Hence, the last voter will have the following ranking:

Thus, each candidate occupies each of the positions once. Hence, all candidates get the same Borda score of

Next, computing any highest scoring -Borda committee takes time polynomial in the size of input where , . Moreover, in the constructed instance, as all candidates bring the same utility to the committee, each -sized committees will be the highest scoring committee with a of

and hence, the NP-hardness of the problem is due to finding a feasible committee that satisfies for all , where .

Finally, there are no voter attributes as and hence, no representation constraints (. This completes our construction for the reduction, which is a polynomial time reduction in the size of and . Note that we assume that the number of candidate attributes is always less than the number of candidates . More specifically, our reduction holds when , which is a realistic assumption as we ideally expect to be very small [celis2017multiwinner].

For the proof of correctness, we show the following:

Claim 1.

We have a vertex cover of size at most that satisfies for all if and only if we have at least one committee of size at most that satisfies all the diversity constraints, which means that for all , which equals as for all .

() If the instance of the VC problem is a yes instance, then the corresponding instance of (, , )-DRCWD is a yes instance as each and every candidate group will have at least one of their members in the winning committee , i.e., for all . Note that we have set for all .

More specifically, for each block of candidates, we select one dummy candidate from Set and all dummy candidates from Set . This helps to satisfy the condition for all candidate groups that contain at least one dummy candidate . Overall, we select candidates from blocks for each of the candidates that correspond to vertices in the vertex cover. This results in candidates in the committee. Next, for groups that do not contain any dummy candidates, select candidates that correspond to vertices that form the vertex cover. These candidates satisfy the constraints. Specifically, these candidates satisfy for all the candidate groups that do not contain any dummy candidates. Hence, we have a committee of size .

() The instance of the (, , )-DRCWD is a yes instance when we have candidates in the committee. This means that each and every group will have at least one of their members in the winning committee , i.e., for all . Then the corresponding instance of the VC problem is a yes instance as well. This is because the vertices that form the vertex cover correspond to the candidates that satisfy for all the candidate groups that do not contain any dummy candidates. This completes the proof. ∎

Theorem 4.

If and is an even number, , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is NP-hard.

Proof.

For our reduction, we use -Borda committee selection rule as an example of monotone, separable function.

We reduce an instance of vertex cover (VC) problem to an instance of (, , )-DRCWD. We have two candidate and for each vertex , and dummy candidates where corresponds to the number of vertices in the graph and is the number of candidate attributes. Formally, we set = {} {} and the dummy candidate set = {}. Hence, the candidate set = is of size candidates. We set the target committee size to be .

Next, we have candidate attributes. Each edge that connects vertices and correspond to two candidate groups such that group contains two candidates and that correspond to vertices and and the group contains two candidates and that also correspond to vertices and . Note that by having candidates in , we are in fact duplicating the graph . As our reduction proceeds from a 3-regular graph, each vertex is connected to three edges. This corresponds to each candidate having three attributes and thus, belonging to three groups. Next, for each candidate , we have blocks of dummy candidates, each block containing dummy candidates . Thus, we have a total of dummy candidates, which equals to dummy candidates. Next, each block of candidates contains 3 sets of candidates: Set contains one candidate and Sets and contain candidates each. Specifically, each of the blocks for each candidate is constructed as follows in line with the construction in the proof for Theorem 3:

  • Set consists of single dummy candidate, .

  • Set consists of dummy candidates, for all .

  • Set consists of dummy candidates, for all .

Each candidate in the block has attributes and are grouped as follows:

  • The dummy candidate is in the same group as candidate . It is also in groups, individually with each of dummy candidates, . Thus, the dummy candidate has attributes and is part of groups.

  • For each dummy candidate , it is in the same group as as described in the previous point. It is also in groups, individually with each of dummy candidates, . Thus, each dummy candidate has attributes and is part of groups.

Note that the grouping of the candidates in Set differs significantly from the construction in the proof for Theorem 3:

  • For each dummy candidate , it is in groups, individually with each of dummy candidates, , as described in the previous point. Next, note that as is an even number, is an odd number, which means Set has an odd number of candidates. Hence, we randomly divide candidates into two partitions. Then, we create groups over one attribute where each group contains two candidates from Set such that one candidate is selected from each of the two partitions without replacement. Thus, each pair of groups is mutually disjoint. Hence, each dummy candidate is part of exactly one group that is shared with exactly one another dummy candidate where . Overall, this construction results in one attribute and one group for all but one dummy candidate , which results into a total of attributes and groups for these candidates. This is because groups can hold candidates. Hence, one candidate still has attributes and is part of groups. If this block of dummy candidates is for candidate , then another corresponding block of dummy candidates for candidate will also have one candidate who will have attributes and is part of groups. We group these two candidates from separate blocks. Hence, now that one remaining candidate also has attributes and is part of groups. As there is always an even number of candidates in set (), such cross-block grouping of candidates among a total of blocks, also an even number, is always possible.

As a result of the above described grouping of candidates, each candidate also has attributes and is part of groups. Note that each candidate already had three attributes and was part of three groups due to our reduction from vertex cover problem on 3-regular graphs. Additionally, we added blocks of dummy candidates and grouped candidate with candidate from each of the blocks. Hence, each candidate has attributes and is part of groups. We set for all , which corresponds that each vertex in the vertex cover should be covered by some chosen edge.

Finally, we introduce voters, in line with our reduction in proof of Theorem 3. For simplicity, let denote the candidate in set . The first voter ranks the candidates based on their indices.

The second voter improves the rank of each candidate by one position but places the top-ranked candidate to the last position.

Similarly, all the voters rank the candidates based on this method. Hence, the last voter will have the following ranking:

Thus, each candidate occupies each of the positions once. Hence, all candidates get the same Borda score of

Next, computing any highest scoring -Borda committee takes time polynomial in the size of input where , . Moreover, in the constructed instance, as all candidates bring the same utility to the committee, each -sized committees will be the highest scoring committee with a of

and hence, the NP-hardness of the problem is due to finding a feasible committee that satisfies for all , where .

Finally, there are no voter attributes as and hence, no representation constraints (. This completes our construction for the reduction, which is a polynomial time reduction in the size of and . Note that we assume that the number of candidate attributes is always less than the number of candidates . More specifically, our reduction holds when , which is a realistic assumption as we ideally expect to be very small [celis2017multiwinner].

For the proof of correctness, we show the following:

Claim 2.

We have a vertex cover of size at most that satisfies for all if and only if we have at least one committee of size at most that satisfies all the diversity constraints, which means that for all , which equals as for all .

() If the instance of the VC problem is a yes instance, then the corresponding instance of (, , )-DRCWD is a yes instance as each and every candidate group will have at least one of their members in the winning committee , i.e., for all . Note that we have set for all .

More specifically, for each block of candidates, we select one dummy candidate from Set and all dummy candidates from Set . This helps to satisfy the condition for all candidate groups that contain at least one dummy candidate . Overall, we select candidates from blocks for each of the candidates that correspond to vertices in the vertex cover. This results in candidates in the committee. Next, for groups that do not contain any dummy candidates, select candidates that correspond to vertices that form the vertex cover. These candidates satisfy the constraints. Specifically, these candidates satisfy for all the candidate groups that do not contain any dummy candidates. Hence, we have a committee of size .

() The instance of the (, , )-DRCWD is a yes instance when we have candidates in the committee. This means that each and every group will have at least one of their members in the winning committee , i.e., for all . Then the corresponding instance of the VC problem is a yes instance as well. This is because the vertices that form the vertex cover correspond to the candidates that satisfy for all the candidate groups that do not contain any dummy candidates. We remind that we had constructed candidates in the instance of (, , )-DRCWD problem that correspond to vertices in the VC problem, which means that we need candidates instead of candidates to satisfy diversity constraints for candidate groups that do not contain any dummy candidates. This completes the proof. ∎

5.2.2 (, , )-DRCWD w.r.t. representation constraints

We now study the computational complexity of (, , )-DRCWD due to the presence of voter attributes. The following theorem helps us prove the statement in Corollary 1(4).

Theorem 5.

If , , and the committee selection rule is a monotone, separable function, then (, , )-DRCWD is NP-hard.

Proof.

For our reduction, we use -Borda committee selection rule as an example of monotone, separable function.

We reduce an instance of vertex cover (VC) problem to an instance of (, , )-DRCWD. We have one candidate for each vertex , and dummy candidates where corresponds to the number of edges and corresponds to the number of vertices in the graph . Formally, we set = {} and the dummy candidate set = {}. Hence, the candidate set = consists of candidates. We set the target committee size to be .

We now introduce one voter for each edge . More specifically, an edge connects vertices and . Then, the corresponding voter ranks the candidates in the following collection of sets , , , such that :

  • Set : candidates and that correspond to vertices and are ranked at the top two positions, ordered based on their indices. For voter where , we denote the candidates and as and .

  • Set : out of () dummy candidates are ranked in the next positions, again ordered based on their indices. For each voter, these candidates are distinct as shown below. Hence, for all pairs of voters , we know that .

  • Set : the next positions are occupied by the remaining candidates in that correspond to the vertices in graph , ordered based on their indices.

  • Set : the last positions are occupied by the remaining dummy candidates in , ordered based on their indices.

More specifically, the voters rank the candidates as shown below:

Voter Set Set Set Set
1
2
3

Next, there are no candidate attributes as and hence, no diversity constraints (. The voters are divided into disjoint population over one or more attributes as . Consider the base case when : Here, each voter belongs to one population . Hence, we have population of voters (). We set the representation constraint to 1. Hence, for all . The winning committee for each population will consist of the top -ranked candidates in the ranking of the voter in population . This is because we are using Borda scoring rule to assign scores to the candidates and the top- ranked candidates will indeed be the top- highest scoring candidates.

Now, w.l.o.g., assume that and some voter belongs to and where