Multiwinner Elections with Diversity Constraints

11/17/2017 ∙ by Robert Bredereck, et al. ∙ AGH University of Oxford Berlin Institute of Technology (Technische Universität Berlin) TU Wien 0

We develop a model of multiwinner elections that combines performance-based measures of the quality of the committee (such as, e.g., Borda scores of the committee members) with diversity constraints. Specifically, we assume that the candidates have certain attributes (such as being a male or a female, being junior or senior, etc.) and the goal is to elect a committee that, on the one hand, has as high a score regarding a given performance measure, but that, on the other hand, meets certain requirements (e.g., of the form "at least 30% of the committee members are junior candidates and at least 40% are females"). We analyze the computational complexity of computing winning committees in this model, obtaining polynomial-time algorithms (exact and approximate) and NP-hardness results. We focus on several natural classes of voting rules and diversity constraints.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We study the problem of computing committees (i.e., sets of candidates) that, on the one hand, are of high quality (e.g., consist of high-performing individuals) and that, on the other hand, are diverse (as specified by a set of constraints). The following example shows our problem in more concrete terms.

Consider an organization that wants to hold a research meeting on some interdisciplinary topic such as, e.g., “AI and Economics.” The meeting will take place in some secluded location and only a certain limited number of researchers can attend. How should the organizers choose the researchers to invite? If their main criterion were the number of highly influential AI/economics papers that each person published, then they would likely end up with a very homogeneous group of highly-respected AI professors. Thus, while this criterion definitely should be important, the organizers might put forward additional constraints. For example, they could require that at least 30% of the attendees are junior researchers, at least 40% are female, at least a few economists are invited (but only senior ones), the majority of attendees work on AI, and the attendees come from at least 3 continents and represent at least 10 different countries.111For example, the Leibniz-Zentrum für Informatik that runs Dagstuhl Seminars gives similar suggestions to event organizers. In other words, the organizers would still seek researchers with high numbers of strong publications, but they would give priority to making the seminar more diverse (indeed, junior researchers or representatives of different subareas of AI can provide new perspectives; it is also important to understand what people working in economics have to say, but the organizers would prefer to learn from established researchers and not from junior ones).

The above example shows a number of key features of our committee-selection model. First, we assume that there is some function that evaluates the committees (we refer to it as the objective function). In the example it was (implicitly) the number of high-quality papers that the members of the committee published. In other settings (e.g., if we were shortlisting job candidates) these could be aggregated opinions of a group of voters (the recruitment committee, in the shortlisting example).

Second, we assume that each prospective committee member (i.e., each researcher in our example) has a number of attributes, which we call labels. For example, a researcher can be junior or senior, a male or a female, can work in AI or in economics or in some other area, etc. Further, the way in which labels are assigned to the candidates may have a structure on its own. For example, each researcher is either male or female and either junior or senior, but otherwise these attributes are independent (i.e., any combination of gender and seniority level is possible). Other labels may be interdependent and may form hierarchical structures (e.g., every researcher based in Germany is also labeled as representing Europe). Yet other labels may be completely unstructured; e.g., researchers can specialize in many subareas of AI, irrespective how (un)related they seem.

Third, we assume that there is a formalism that specifies when a committee is diverse. In principle, this formalism could be any function that takes a committee and gives an accept/reject answer. However, in many typical settings it suffices to consider simple constraints that regard each label separately (e.g., “at least 30% of the researchers are junior” or “the number of male researchers is even”). We focus on such independent constraints, but studying more involved ones, that regard multiple labels (e.g., “all invited economists must be senior researchers”) would also be interesting.

Our goal is to find a committee of a given size that is diverse and has the highest possible score from the objective function. While similar problems have already been considered (see the Related Work section), we believe that our paper is the first to systematically study the problem of selecting a diverse committee, where diversity is evaluated with respect to candidate attributes. We provide the following main contributions:

  1. We formally define the general problem of selecting a diverse committee and we provide its natural restrictions. Specifically, we focus on the case of submodular objective functions (with the special case of separable functions), candidate labels that are either layered or laminar,222If we restricted our example to labels regarding gender and seniority level, we would have 2-layered labels (because there are two sets of labels, and , and each candidate has one label from each set. On the other hand, hierarchical labels, such as those regarding countries and continents, are 1-laminar (see description of the model for more details). and constraints that specify sets of acceptable cardinalities for each label independently (with the special case of specifying intervals of acceptable values).

  2. We study the complexity of finding a diverse committee of a given size, depending on the type of the objective function, the type of the label structure, and the type of diversity constraints. While in most cases we find our problems to be -hard (even if we only want to check if a committee meeting diversity constraints exists; without optimizing the objective function), we also find practically relevant cases with polynomial-time algorithms (e.g., our algorithms would suffice for the research-meeting example restricted to the constraints regarding the seniority level and gender). We provide approximation algorithms for some of our -hard problems.

  3. We study the complexity of recognizing various types of label structures. For example, given a set of labeled candidates, we ask if their labels have laminar or layered structure. It turns out that recognizing structures with three independent sets of labels is -hard, whereas recognizing up to two independent sets is polynomial-time computable.

  4. Finally, we introduce the concept of price of diversity, which quantifies the “cost” of introducing diversity constraints subject to the assumed objective function.

Our main results are presented in Table 1.

2 The Model

For , we write to denote the set . We write as an abbreviation for . For a set , we write to denote the family of all of its subsets. We first present our model in full generality and then describe the particular instantiations that we focus on in our analysis.

General Model

Let be a set of candidates and let be a set of labels (such as junior, senior, etc.). Each candidate is associated with a subset of these labels through a labeling function . We say that a candidate  has label  if , and we write to denote the set of all candidates that have label .

A diversity specification is a function that given a committee (i.e., a set of candidates), the set of labels, and the labeling function provides a yes/no answer specifying if the committee is diverse. If a committee is diverse with respect to diversity specification , then we say that it is -diverse.

An objective function is a function that associates each committee with a score. We assume that and that the function is monotone (i.e., for each two committees and such that , it holds that ). In other words, an empty committee has no value and extending a committee cannot hurt it.

Our goal is to find a committee of a given size that meets the diversity specification and that has the highest possible score according to the objective function.

Definition 1 (Diverse Committee Winner Determination (Dcwd)).

Given a set of candidates , a set of labels , a labeling function , a diversity specification , a desired committee size , and an objective function , find a committee with that achieves the maximum value among all -diverse size- committees.

The set of candidates, the set of labels, and the labeling function are specified explicitly (i.e., by listing all the candidates with all their labels). The encoding of the diversity specification and the objective function depends on a particular case (see discussions below). To consider the problem’s -hardness, we take its decision variant, where instead of asking for a -diverse committee with the highest possible value of the objective function we ask if there exists a -diverse committee with objective value at least (where the threshold is a part of the input).

We also consider the Diverse Committee Feasibility (DCF) problem, which takes the same input as the winner determination problem, but where we ask if any -diverse committee of size exists, irrespective of its objective value. In other words, the feasibility problem is a special case of the decision variant of the winner determination problem, where we ask about a -diverse committee with objective value greater or equal to . Thus, if the feasibility problem is -hard, then the analogous winner determination problem is -hard as well (and if the winner determination problem is polynomial-time computable, so is the feasibility problem).

The model, as specified above, is far to general to obtain any sort of meaningful computational results. Below we specify its restrictions that we study.

Objective Functions

An objective function is submodular if for each two committees and such that and each it holds that . For two sets of candidates and , we write to denote the marginal contribution of the candidates from with respect to those in . Formally, we have . Submodular functions are very common and suffice to express many natural problems. We assume all our objective functions to be submodular.

Example 1.

Consider the following voting scenario. Let be a set of candidates and a set of voters, where each voter ranks all the candidates from best to worst. We write to denote the position of candidate in the ranking of voter (the best candidate is ranked on position , the next one on position , and so on). The Borda score associated with position (among possible ones) is . Under the Chamberlin–Courant rule (CC), the score of a committee is defined by objective function Intuitively, this function associates each voter with her representative (the member of the committee that the voter ranks highest) and defines the score of the committee as the sum of the Borda scores of the voters’ representatives. It is well-known that this function is submodular (Lu and Boutilier, 2011). The CC rule outputs those committees (of a given size ) for which the CC objective function gives the highest value (and, intuitively, where each voter is represented by a committee member that the voter ranks highly).

As a special case of submodular functions, we also consider separable functions. A function is separable if for every candidate there is a weight such that the value of a committee is given as . While separable functions are very restrictive, they are also very natural.

Example 2.

Consider the setting from Example 1, but with objective function . This function sums Borda scores of all the committee members from all the voters and models the -Borda voting rule (the committee with the highest score is selected). The function is separable as for each candidate it suffices to take . It is often argued that -Borda is a good rule when our goal is to shortlist a set of individually excellent candidates (Faliszewski et al., 2017).

Together, Example 1 and Example 2 show that our model suffices to capture many well-known multiwinner voting scenarios. Many other voting rules, such as Proportional Approval Voting, or many committee scoring rules, can be expressed through submodular objective functions (Skowron, Faliszewski, and Lang, 2016; Faliszewski et al., 2016).

Diversity Specifications

We focus on diversity specifications that regard each label independently. In other words, the answer to the question if a given committee is diverse or not depends only on the cardinalities of the sets .

Definition 2.

For a set of candidates , a set of labels , and a labeling function , we say that a diversity specification is independent (consists of independent constraints) if and only if there is a function (referred to as the cardinality constraint function) such that a committee is diverse exactly if for each label it holds that .

If we have candidates then specifying independent constraints requires providing at most numbers for each label. Thus independent constraints can easily be encoded in the inputs for our algorithms.

Independent constraints are quite expressive. For example, they are sufficient to express conditions such as “the committee must contain an even number of junior researchers” or, since our committees are of a given fixed size, conditions of the form “the committee must contain at least females.” Indeed, the conditions of the latter form are so important that we consider them separately.

Definition 3.

For a set of candidates , a set of labels , and a labeling function , we say that a diversity specification  is interval-based (consists of interval constraints) if and only if there are functions (referred to as the lower and upper interval constraint functions) such that a committee is diverse if and only if for each label it holds that .

Label Structures

In principle, our model allows each candidate to have an arbitrary set of labels. In practice, there usually are some dependencies between the labels and these dependencies can have strong impact in the complexity of our problem. We focus on labels that are arranged in independent, possibly hierarchically structured, layers.

Let be a set of candidates, let be a set of labels, and let be a labeling function. We say that has -layered structure (i.e., we have a -layered labeling) if for each two distinct labels it holds that (i.e., each candidate has at most one of these labels). For example, if we restricted the example from the introduction to labels regarding the seniority level (junior or senior), then we would have a -layered labeling.

More generally, we say that a labeling is -laminar if for each two distinct labels we have that either (a) or (b) or (c) . In other words, -laminar labellings allow the labels to be arranged hierarchically.

Example 3.

Consider a set of five candidates and labels that encode the countries and continents where the candidates come from. Specifically, there are four countries , and two continents and . The candidates are labeled as follows:

Figure 0(a) illustrates the -laminar inclusion-wise relations between the labels (there can be more levels of the hierarchy; for example, for each country there could be labels specifying local administrative division).

(a) -laminar labeling

(b) Tree representation
Figure 1: Illustration of a -laminar labeling structure.

Every -laminar labeling, together with the set of candidates, can be represented as a rooted tree in the following way: For a pair of distinct labels we create an arc from to if and there is no label such that . We add a root label and we impose that each candidate has this label; we add an arc from to each label without an incoming arc. The resulting digraph is clearly a rooted tree. See Figure 0(b) for an illustration.

For each positive integer , we say that a labeling is -layered (respectively, -laminar) if the set of labels can be partitioned into sets such that for each , the labeling restricted to the labels from is -layered (respective, -laminar).

Example 4.

In the example from the introduction, restricting our attention to candidates’ gender and seniority levels, we get a -layered labeling structure. If we also consider labels regarding countries and continents, then we get a -laminar structure (however, only the geographic labels would be using the full power of laminar labellings).

We assume that when we are given a -layered (-laminar) labeling structure, we are also given the partition of the set of labels that defines this structure (in Section 5 we analyze the problem of recognizing such structures algorithmically).

Balanced Committee Model

As a very natural special case of our model we considered the problem of computing balanced committees. In this case there are only two labels (e.g., male and female), each candidate has exactly one label, and the constraint specification is that we need to select exactly the same number of candidates with either label (thus, by definition, the committee must be of an even size).

Computing balanced committees is a very natural problem. For example, seeking gender balance is a common requirement in many settings. In this paper, we seek exact balance (that is, we seek exactly the same number of candidates with either label) but allowing any other proportion would lead to similar results.

3 Separable Objective Functions

Separable objective functions form a simple, but very important special case of our setting. Indeed, such functions are very natural in shortlisting examples, where diversity constraints are used to implement, e.g., affirmative actions or employment-equity laws. We organize our discussion with respect to the type of constraint specifications.

Independent Constraints

It turns out that independent constraints are quite difficult to work with. If the labels are -laminar then polynomial-time algorithms exist (both for deciding if feasible committees exist and for computing optimal ones), but with -layered labellings our problems become -hard (recall that -layered labellings are a special case of -laminar ones). Our polynomial-time algorithms proceed via dynamic programming and hardness proofs use reductions from Exact 3 Set Cover (X3C).

Theorem 4.

Let be a diversity specification of independent constraints. Suppose that  is -laminar and  is separable. Then, DCWD can be solved in time. Moreover, DCF can be solved in time. If the function is -layered then both problems are -hard (even if each candidate has at most two labels, and each label is associated to at most three candidates).

Proof.

We first consider the case where is -laminar and we give a polynomial-time algorithm.

Let be the cardinality constraint function corresponding to diversity specification  of the input. Let be a rooted tree representation for ; we denote by the root label that corresponds to the size constraint on the whole committee size, i.e., and . Additionally, we add for every non-leaf label  in the tree representation of the labeling structure (including the possibly newly created ) an artificial label  with  and add this label to every candidate that has label  but none of the (original) child-labels of . This step clearly does no influence the solvability of our problem but ensures that every candidate has at least one label that is a leaf node in the tree representation of the labeling structure.

For each , we denote by the th child of  in , and by number of children of  in . By we denote the set of all descendants of  (including itself, i.e., ). Furthermore, let  be the candidate from  with the th largest value according to . For technical reasons, we introduce the  symbol as placeholder for a non-existing (sub)committee and define for any set . We set  and .

We describe a dynamic programming algorithm that solves DCWD using the integer table  where contains a (sub)committee  with maximum total score  among all committees that consist of  candidates with labels from  such that for all and .

It is not hard to verify that the overall solution for the DCWD instance can be read from the table as .

We will now show how to compute the table  in a bottom-up manner. For each leaf label-node  we set  if , that is, is the set of the  “best” candidates with label  and, otherwise, we set . For each inner label-node , we set  to where

if and, otherwise, we set . Further, for each inner label-node  and , we set  to where if and, otherwise, we set .

As for the running time, sorting the candidates with respect to their value according to  takes time. The table is of size  and computing a single table entries takes at most  time. The overall running time is which is polynomial since .

For DCF, we can skip to sort candidates which leads to the improved running time .

Let us now consider the second part of the theorem. We use a reduction from the -hard Exact Cover by -Sets which, given a finite set  and a collection  of size- subsets of , asks whether there is a subcollection that partitions , that is, each element of is contained in exactly one subset from . The reduction is similar to the reduction of the somewhat closely related General Factor problem (Cornuéjols, 1988) and works as follows: Create one element label  for each element  and one set label  for each subset . We set for each and for each . For each subset , create three candidates , , and labeled with , , and , respectively. Finally, set the committee size . This completes the construction which can clearly be performed in polynomial time. For the correctness, assume that there is a subcollection that partitions . It is easy to verify that is a -diverse committee. Furthermore, let be an arbitrary -diverse committee. Now, partitions : each element  is covered exactly once since for all , and is pairwise disjoint since for all . ∎

Given the above hardness results, it is immediate to ask about the parametrized complexity of our problems because in many settings the label structures are very limited (for example, the -layered gender/seniority labeling from the introduction contains only labels and already is very relevant for practical applications). Unfortunately, for independent constraints our problems remain hard when parametrized by the number of labels.

Theorem 5.

Both DCF and DCWD problems are -hard with respect to the number of labels , even if is a diversity specification of independent constraints.

Proof.

We describe a parametrized reduction from the -hard Multicolored Clique (MCC) problem which, given an undirected graph , a non-negative integer , and a vertex coloring , asks whether graph  admits a colorful -clique, that is, a size- vertex subset  such that the vertices in  are pairwise adjacent and have pairwise distinct colors. Without loss of generality, we assume that the number of vertices from each color class equals some integer . Let be an MCC instance. We denote the set of vertices of color  as . We construct a DCF instance as follows.

Labels.  For each color  we have a lower vertex label  and a higher higher vertex label . For each (unordered) color pair  we have an edge label . (So that is obviously upper-bounded by some function in .)

Candidates and Labeling.  For each color  and each vertex  we introduce  lower color--selection candidates and  higher color--selection candidates. The labeling function  is defined as follows. For each lower color--selection candidate  we have . For each higher color--selection candidate  we have . Introduce further  dummy candidates  with .

Diversity Constraints.  We define the cardinality constraint function  as follows. For each color  we set  and set . For each (unordered) color pair  we set there is an edge between the th vertex from  and the th vertex from .

We finally set the committee size . This completes the reduction which clearly runs in polynomial time. It remains to show that the graph  has a colorful -clique if and only if the constructed DCF instance admits a diverse committee.

Assume that  has a colorful -clique . Let denote the index of the color  vertex from , that is, if and only if  contains the th vertex of color . It is not hard to verify that a diverse committee can be constructed as follows. Start with a committee that consists only of  dummy candidates. For each color  replace  dummy candidates by lower color -selection candidates and replace  dummy candidates by higher color -selection candidates. The diversity constraints of the lower and higher vertex labels are clearly fulfilled by this construction. Now, consider some edge label . Our construction ensures that there are exactly  lower color--selection candidates in the committee with label  and further  higher color--selection candidates with label  (and no further candidates with label ). Since  is a clique, we know that the th vertex of color  is adjacent to he th vertex of color  and thus . Thus, also the diversity constraints for the edge labels are fulfilled and the committee is indeed diverse.

Finally, assume that the constructed DCF instance admits some diverse committee. To fulfill the diversity constraints for the lower vertex labels for each color  there is some number  such that there are exactly  lower color--selection candidates and further  higher color--selection candidates in the committee. (The former is directly enforced by the diversity constraints for the lower vertex labels and the latter follows then immediately from the diversity constraints for the higher vertex labels.) We claim that  is an -colored clique. It is clear from the definition of  that and that  is -colored but it remains to show that  is indeed a clique. To show this, suppose towards a contradiction that there are two colors  such that vertex  and vertex  are not adjacent. Now, there are exactly  lower color--selection candidates in the committee with label  and further  higher color--selection candidates with label  (and no further candidates with label ). Furthermore, since the diversity constraint of label  is fulfilled, it must hold that and so that vertex  and vertex  are adjacent—a contradiction. ∎

However, not all is lost and sometimes brute-force algorithms are sufficiently effective. For example, if we have a -layered labeling (where is a small constant) then each candidate has at most different labels and it suffices to consider each size- labeling separately. A brute-force algorithm based on this idea suffices, e.g., for the example from the seniority/specialty labels from the introduction (it would have running time, because there are combinations of labels and ; the algorithm could also deal with non-independent constraints).

Interval constraints

Interval constraints are more restrictive than general independent ones, but usually suffice for practical applications and are more tractable. For example, for the case of -laminar labellings we give a linear-time algorithm for recognizing if a feasible committee exists (for independent constraints, our best algorithm for this task is quadratic).

Theorem 6.

Let be a diversity specification of interval constraints. If is -laminar, then DCF can be solved in time.

Proof.

Let and denote the lower and upper interval constraint functions, respectively. Let be a rooted tree representation for ; we denote by the root label that corresponds to the size constraint on the whole committee size, i.e., and . For each , we denote by the set of children of and by the set of descendants of in (including ).

For every label , let be the set of committees satisfying the constraints up until , namely, for each ; then, we define (respectively, ) to be the minimum (respectively, the maximum) value for which there is a set with . If there exists no committee satisfying the aforementioned constraints we set and to and , respectively. Clearly, there is a -diverse committee if and only if . The values and can be efficiently computed by a dynamic programming in a bottom-up manner as follows.

For each leaf at , we set and if ; we set and otherwise. For each internal node , we set and if

  • for all child ; and

  • there are enough candidates in to fill in the lower bound () and the lower bound does not exceed the upper bound, i.e.,

Otherwise, we set and . This can be done in time since each for can be computed in time and since the size of the dynamic programming table is at most and each entry can be filled in constant time.

Now we will show by induction that for each and each , there is a set of size if and only if . The claim is immediate when . Now consider an internal node and suppose that the claim holds for all .

Suppose first that . By induction hypothesis, for each child , there is a committee where for any . By combining all such committees, we have that for any , there is a committee of size such that for all . In particular, since , there is a set of size such that for all . Since , we have .

Conversely, suppose that does not belong to the interval . Suppose towards a contradiction that there is a set of size . Notice that for each , it holds that and hence by induction hypothesis. If or , it is clear that , a contradiction. Further, if , cannot be a subset of , a contradiction. If , then for some label ; however, since , we have by induction hypothesis, a contradiction. A similar argument leads to a contradiction if we assume . ∎

For the case of computing the winning committee we no longer obtain a significant speedup from focusing on interval constraints, but we do get a much better structural understanding of the problem. In particular, we can use a greedy algorithm instead of relying on dynamic programming. Briefly put, our algorithm (presented as Algorithm 1) starts with an empty committee and performs iterations ( is the desired committee size), in each extending the committee with a candidate that increases the score maximally, while ensuring that the committee can still be extended to one that meets the diversity constraints. To show that this greedy algorithm is correct and that it can be implemented efficiently, we use some notions from the matroid theory.

Formally, a matroid

is an ordered pair

, where is some finite set and is a family of its subsets (referred to as the independent sets of the matroid). We require that (I1) , (I2) if , then , and (I3) if and , then there exists such that . The family of maximal (with respect to inclusion) independent sets of a matroid is called its basis. Many of our arguments use results from matroid theory, but often used in very different contexts than originally developed. In particular, the next theorem, in essence, translates the results of Yokoi (2017) to our setting.

Theorem 7.

Let be a diversity specification of interval constraints. Suppose that is a 1-laminar, and

is a separable function given by a weight vector

. Then, DCWD can be solved in time.

notation :  is the set of -diverse, size- committees, is its lower extension.
input : : the objective function,
: the size of the committee.
output : 
1 set ;
2 while  do
3       choose a candidate such that with the maximum improvement ;
4       set ;
5      
Algorithm 1 Greedy Algorithm
Proof.

Let be the set of -diverse committees of size and assume that is nonempty. For a family of subsets of a finite set , we define its lower extension333Note that the lower extension does not necessarily ignore the lower bounds. For instance, consider when we want to select a committee of size such that there are exactly three female candidates and at most two male candidates; the corresponding lower extension only includes the sets of female candidates of size at most , whereas a male-only committee of size satisfies the upper bounds. by

It is known that if our constraints are given by intervals, the lower extension of comprises the independent sets of a matroid whenever (Yokoi, 2017). Thus, the greedy algorithm (Algorithm 1) finds an optimal solution (see, e.g., the book of Korte and Vygen (2006), Chapter 13). By construction, and hence is a maximal element in , which follows that . Since , we have . Further, Yokoi (2017) showed that checking whether a set belongs to can be efficiently done by maintaining a set with ; thus, the greedy algorithm runs in polynomial time

Now it remains to analyze the running time of the algorithm. Sorting the candidates from the best to the worst requires time, given a weight vector . In each step, we need to check whether a set belongs to . Yokoi (2017) showed that this can be efficiently done by maintaining a set with . Specifically, the following lemma holds.

Lemma 8 (Lemma of Yokoi, 2017).

Let be a matroid. Let be an independent set of the matroid, be a basis with , and . Then, is independent if and only if or is a basis for some .

The lemma implies that provided a set with , deciding can be verified by checking whether or for some ; this can be done in time. One can maintain such a superset of by first computing a set in time as we have proved in Theorem 6, and updating the set in each step as follows: If , then find a candidate such that , and set ; otherwise, we do not change the set . Since there are at most iterations, the greedy algorithm runs in time. ∎

Unfortunately, the greedy algorithm does not work for more involved labeling structures, but for -laminar labellings we can compute winning committees by reducing the problem to the matroid intersection problem (Edmonds, 1979). For more involved labeling structures our problems become -hard.

Theorem 9.

Let be a diversity specification of interval constraints. Suppose that is 2-laminar and is separable. Then, DCF can be solved in time, and DCWD can be solved in time.

In the subsequent proof, we will use the following notions and results in matroid theory: Given a matroid , the sets in are called dependent, and a minimal dependent set of a matroid is called circuit. Crucial properties of circuits are the following.

Lemma 10.

Let be a matroid, , and such that . Then the set contains a unique circuit.

We write for the unique circuit in . The set can be characterized by the elements that can replace , i.e., for each independent set of a matroid and with ,

The following lemma by Frank (1981) serves as a fundamental property for proving the matroid intersection theorem.

Lemma 11 (Frank, 1981).

Let be a matroid and . Let and where for . Suppose that

  • for and

  • for .

Then, .

Now we are ready to prove Theorem 9.

Proof.

Let and denote the lower and upper interval constraint functions, respectively. Let be a partition of  such that for each , the labeling restricted to the labels from is -laminar. For , we denote by the set of committees of size satisfying the constrains in , i.e., . If at least one of them is empty, then there is no -diverse committee; thus we assume otherwise. We have argued that the lower extension for each forms the independent sets of a matroid when is -laminar. Thus, our problem can be reduced to finding a maximum common independent set over the two matroids. That is, we will try to compute the following value:

Clearly, there is a -diverse committee of size if and only the maximum value equals . It is well-known that this problem can be solved by Edmond’s matroid intersection algorithm (Edmonds, 1979), given a membership oracle for each . The idea is that starting with the empty set, we repeatedly find ‘alternating paths’ and augment by one element in each iteration while keeping the property . Specifically, we apply the notion to and write for each . For , we define an auxiliary graph where the set of arcs is given by

for . We then look for a shortest path from to , where

for . We increase the size of by taking the symmetric difference with the path. It was shown that this procedure computes the desired value. We provide a formal description of the algorithm below (Algorithm 2).

Similarly to Yokoi (2017), we can efficiently construct an auxiliary graph in each step by maintaining a set such that and for each : First, as we have seen in Lemma 8, we can determine the membership of a given set in in polynomial time. Moreover, it can be easily verified that the unique circuit coincides with when .

Lemma 12.

Let be a matroid. Let be an independent set of the matroid, be a basis with , and with being dependent. Then, .

Proof.

Notice that is dependent: thus it contains a unique circuit . Then, clearly holds since . ∎

Thus, we will show how to maintain such a set with for each . We will first compute a set for each in time. Now suppose that . Let where for , and be a shortest path in with and . Notice that for since otherwise contains a dependent set of a matroid , contradicting I2; similarly, for since otherwise contains a dependent set of a matroid , a contradiction. If , then there is a candidate such that by Lemma 8, and we set to be . Similarly, if , then there is a candidate such that