One of the central problems in artificial intelligence and computational social choice is aggregating preferences of individual agents (see the overview of Conitzer). Here we focus on multi-winner choice, where the goal is to select a -element subset of a set of candidates. Given preferences of the agents, the subset is identified by means of a voting rule. This scenario covers a variety od settings: nations elect members of parliament or societies elect committees , web search engines choose pages to display in response to a query , airlines select movies available on board , companies select a group of products to promote , etc.
In this work we restrict our attention to the situation where each vote (expression of the preferences of an agent) is a subset of the candidates. Various voting rules are studied. In the simplest one, Approval Voting (AV), occurences of each candidate are counted and most often chosen candidates are selected. While this rule has many desirable properties in the single winner case , in the multi-winner scenario its merits are often considered less clear . Therefore, numerous alternative rules have been proposed (see ), including Satifaction Approval Voting (SAV, satifaction of an agent is the fraction of her approved candidates that are elected; the goal is to maximize the total satisfaction), Proportional Approval Voting (PAV: like SAV, but satisfaction of an agent whose approved candidates are selected is the -th harmonic number ), Reweighted Approval Voting (RAV: a -round scheme, in each round another candidate is selected). In this paper we study a rule called Minimax Approval Voting (MAV), introduced by Brams, Kilgour, and Sanver . Here, we see the votes and the choice as 0-1 strings of length (characteristic vertors of the subsets). The goal is to minimize the maximum Hamming distance to a vote. (Recall that the Hamming distance of two strings and of the same length is the number of positions where and differ.)
Our focus is on the computational complexity of computing the choice based on the MAV rule. In the Minimax Approval Voting decision problem, we are given a multiset of - strings of length (also called votes), and two integers and . The question is whether there exists a string with exactly ones such that for every we have . In the optimization version of Minimax Approval Voting we minimize , i.e., given a multiset and an integer as before, the goal is to find a string with exactly ones which minimizes .
A reader familiar with string problems might recognize that Minimax Approval Voting is tightly connected with the classical NP-complete problem called Closest String, where we are given strings over an alphabet and the goal is to find a string that minimizes the maximum Hamming distance to the given strings. Indeed, LeGrand  showed that Minimax Approval Voting is NP-complete as well by reduction from Closest String with binary alphabet. This motivated the study on Minimax Approval Voting in terms of approximability and fixed-parameter tractability.
Previous results on Minimax Approval Voting
First approximation result was a simple 3-approximation algorithm due to LeGrand, Markakis and Mehta , obtained by choosing an arbitrary vote and taking any approved candidates from the vote (extending it arbitrarily to candidates if needed). Next, a 2-approximation was shown by Caragiannis, Kalaitzis and Markakis using an LP-rounding procedure . Finally, recently Byrka and Sornat  presented a polynomial time approximation scheme (PTAS), i.e., an algorithm that for any fixed gives a -approximate solution in polynomial time. More precisely, their algorithm runs in time what is polynomial on number of voters and number of alternatives . The PTAS uses information extraction techniques from fixed size (
) subsets of voters and random rounding of the optimal solution of a linear program.
In the area of fixed parameter tractability (FPT) the goal is to find algorithms with running time of the form , where is the size of the input istance , is a parameter and is a function, which is typically at least exponential for NP-complete problems. For more about paremeterized algorithms see the textbook of Cygan et al.  or the survey of Bredereck et al. (in the context of computational social choice). The study of FPT algorithms for Minimax Approval Voting was initiated by Misra, Nabeel and Singh . They show for example that Minimax Approval Voting parameterized by the number of ones in the solution (i.e. is the paramater ) is -hard, which implies that there is no FPT algorithm, unless there is a highly unexpected collapse in parameterized complexity classes. From a positive perspective, they show that the problem is FPT when parameterized by the maximum allowed distance . Their algorithm runs in time111The notation suppresses factors polynomial in the input size. 222Actually, in the article  the authors claim the slightly better running time of . However, it seems there is a flaw in the analysis: it states that the initial solution is at distance at most from the solution, while it can be at distance because of what we call here the -completion operation. This increases the maximum depth of the recursion to (instead of the claimed )..
Previous results on Closest String
It is interesting to compare the known results on Minimax Approval Voting with the corresponding ones on the better researched Closest String. The first PTAS for Closest String was given by Li, Ma and Wang  with running time bounded by . This was later improved by Andoni et al.  to , and then by Ma and Sun  to .
The first FPT algorithm for Closest String, running in time was given by Gramm, Niedermeier, and Rossmanith . This was later improved by Ma and Sun , who gave an algorithm with running time , which is more efficient for constant-size alphabets. No further substantial progress is possible, since Lokshtanov, Marx and Saurabh  have shown that Closest String admits no algorithms in time or , unless the Exponential Time Hypothesis (ETH)  fails.
The discrepancy between the state of the art for Closest String and Minimax Approval Voting raises interesting questions. First, does the additional constraint in Minimax Approval Voting really makes the problem harder and the PTAS has to be significantly slower? Similarly, although in Minimax Approval Voting the alphabet is binary, no -time algorithm is known, in contrary to Closest String. Can we find such an algorithm? The goal of this work is to answer these questions.
We present three results on the complexity of Minimax Approval Voting. Let us recall that the Exponential Time Hypothesis (ETH) of Impagliazzo et al.  states that there exists a constant , such that there is no algorithm solving -SAT in time . During the recent years, ETH became the central conjecture used for proving tight bounds on the complexity of various problems, see  for a survey. We begin from showing that, unless the ETH fails, there is no algorithm for Minimax Approval Voting running in time . In other words, the algorithm of Misra et al.  is essentially optimal, and indeed, in this sense Minimax Approval Voting is harder than Closest String. Motivated by this, we then show a parameterized approximation scheme, i.e., a randomized Monte-Carlo algorithm which, given an instance and a number , finds a solution at distance at most in time or reports that there is no solution at distance at most . Note that our lower bound implies that, under (randomized version of) ETH, this is essentially optimal, i.e., there is no parameterized approximation scheme running in time . Indeed, if such an algorithm existed, by picking we get an exact algortihm which contradicts our lower bound. Finally, we get a new polynomial-time randomized approximation scheme for Minimax Approval Voting, which runs in time . Thus the running time almost matches the one of the fastest known PTAS for Closest String (up to a factor in the exponent).
Organization of the paper
In Section 2 we introduce some notation and we recall standard probabability bounds that are used later in the paper. In Section 3 we present our lower bound for Minimax Approval Voting parameterized by . Next, in Section 4 we show a parameterized approximation scheme. Finally, in Section 5 we show a new randomized PTAS. The paper concludes with Section 6, where we discuss directions for future work.
2 Definitions and Preliminaries
For every integer we denote . For a set of words and a word we denote . For a string , the number of ’s in is denoted as and it is also called the Hamming weight of ; similarly denotes the number of zeroes. Moreover, the set of all strings of length with ones is denoted by , i.e., . means -th letter of a string . For a subset of positions we define a subsequence by removing letters on positions from .
For a string , any string at distance from is called a -completion of . Note that it is easy to find such a -completion : when we obtain by replacing arbitrary ones in by zeroes; similarly when we obtain by replacing arbitrary zeroes in by ones.
We will use the following standard Chernoff bounds (see e.g.Chapter 4.1 in ).
Let be independent random - variables such that for every we have , for . Let . Then,
for any we have:
for any we have:
3 A lower bound
In this section we show a lower bound for Minimax Approval Voting parameterized by . To this end, we use a reduction from a problem called -Clique. In -Clique we are given a graph over the vertex set , i.e., forms a grid with rows and columns, and the question is whether in there is a clique containing exactly one vertex in each row.
Given an instance of -Clique with , one can construct an instance of Minimax Approval Voting, such that is a yes-instance iff is a yes-instance, and the set contains strings of length each. The construction takes time polynomial in the size of the output.
Each string in the set will be of size . Let us split the set of positions into blocks, where the first blocks contain exactly positions each, and the last -th block contains the remaining positions. Our construction will enforce that if a solution exists, it will have the following structure: there will be a single in each of the first blocks and put all zeros in the last block. Intuitively the position of the in the first block encodes the clique vertex of the first row of , the position of the in the second block encodes the clique vertex of the second row of, etc.
We construct the set as follows.
(nonedge strings) For each pair of nonadjacent vertices of belonging to different rows, i.e., , , , we add to a string , where all the blocks except -th and -th are filled with zeros, while the blocks , are filled with ones, except the -th position in block and the -th position in block which are zeros (see Fig. 1). Formally, contains ones at positions . Note that the Hamming weight of equals .
(row strings) For each row we create exactly strings, i.e., for and for each set of exactly positions in the -th block we add to a string having ones at all positions of the -th block and at , all the remaining positions are filled with zeros (see Fig. 2). Note that similarly as for the nonedge strings the Hamming weight of each row string equals , and to achieve this property we use the -th block.
|0 …0||1 …1 0 1 …1||0 …0||1 …1 0 1 …1||0 …0|
|on -th position||on -th position|
|-th block||-th block|
|0 …0||1 …1||0 …0||0 0 1 0 1 1 0 …0 1 0|
|-th block||ones on positions|
To finish the description of the created instance we need to define the target distance , which we set . Observe that as the Hamming weight of each string equals , for with exactly ones we have if and only if the positions of ones in and have a non-empty intersection.
Let us assume that there is a clique in of size containing exactly one vertex from each row. For let be the column number of the vertex of from row . Define as a string containing ones exactly at positions , i.e., the -th block contains only zeros and for the -th block contains a single at position . Obviously contains exactly ones, hence it suffices to show that has at least one common one with each of the strings in . This is clear for the row strings, as each row string contains a block full of ones. For a nonedge string , where and note that does not contain and at the same time. Consequently has a common one with in at least one of the blocks , .
In the other direction, assume that is a string of length with exactly ones such that the Hamming distance between and each of the strings in is at most , which by construction implies that as a common one with each of the strings in . First, we are going to prove that contains a in each of the first blocks (and consequently has only zeros in block ). For the sake of contradiction assume that this is not the case. Consider a block containing only zeros. Let be any set of positions in block containing zeros from (such a set exists as block has positions). But the row string has ones at positions where has zeros, and consequently , a contradiction.
As we know that contains exactly one one in each of the first blocks let be such a position of block . Create by taking the vertex from column for each row . Clearly is of size and it contains exactly one vertex from each row, hence it remains to prove that is a clique in . Assume the contrary and let be two distinct nonadjacent vertices of , where and . Observe that the nonedge string contains zeros at the -th position of the -th block and at the -th position of the -th block. Since for , , block of contains only zeros, we infer that the sets of positions of ones of and are disjoint leading to , a contradiction.
As we have proved that is a yes-instance of -Clique iff is a yes-instance of Minimax Approval Voting, the lemma follows. ∎
In order to derive an ETH-based lower bound we need the following theorem of Lokshtanov, Marx and Saurabh .
Assuming ETH, there is no -time algorithm for -Clique.
We are ready to prove the main result of this section.
Assuming ETH, there is no -time algorithm for Minimax Approval Voting.
4 Parameterized approximation scheme
In this section we show the following theorem.
There exists a randomized algorithm which, given an instance of Minimax Approval Voting and any , runs in time and either
reports a solution at distance at most from , or
reports that there is no solution at distance at most from .
In the latter case, the answer is correct with probabability at least , for arbitrarily small fixed .
Let us proceed with the proof. In what follows we assume , since then we can get the claim even if by repeating the whole algorithm times. Indeed, then the algorithm returns incorrect answer only if each of the repetitions returned incorrect answer, which happens with probabability at most .
Assume we are given a yes-instance and let us fix a solution , i.e., a string at distance at most from all the input strings. Our approch is to begin with a string not very far from , and next perform a number of steps. In -th step we either conclude that is already a
-approximate solution, or with some probability we find another stringwhich is closer to .
First observe that if , then clearly there is no solution and our algorithm reports NO. Hence in what follows we assume
We set to be any -completion of . By (5) we get . Since , by the triangle inequality we get the following bound.
Now we are ready to describe our algorithm precisely (see also Pseudocode 1). We begin with defined as above. Next for we do the following. If for every we have the algorithm terminates and returns . Otherwise, fix any such that . Let and . The algorithm samples a position and a position . Then, is obtained from by swapping the at position with the at position . If the algorithm finishes without finding a solution, it reports NO.
The following lemma is the key to get a lower bound on the probablity that the ’s get close to .
Let be a string in such that for some . Let be any solution, i.e., a string at distance at most from all the strings , . Denote
Let be the set of positions on which and differ, i.e., . (See Fig. 3.) Note that . Let .
The intuition behind the proof is that if is small, then differs too much from , either because is similar to (when ) or because has much more 1’s than (when differs much from ).
We begin with a couple of useful observations on the number of ones in different parts of , and . Since and are the same on , we get
Since , we get , and further
Finally note that
We are going to derive a lower bound on . First,
On the other hand,
It follows that
Hence, as required. ∎
Assume that there is a solution and that the algorithm created a string , for some . Then,
5 A fast polynomial time approximation scheme
The goal of this section is to present a PTAS for Minimax Approval Voting running in time . It is achieved by combining the parameterized approximation scheme from Theorem 4.1 with the following result, which might be of independent interest. Throughout this section denotes the value of the optimum solution for the given instance of Minimax Approval Voting, i.e., ,
There exists a randomized polynomial time algorithm which, for arbitrarily small fixed , given an instance of Minimax Approval Voting and any such that , reports a solution, which with probabability at least is at distance at most from .
In what follows, we prove Theorem 5.1. As in the proof of Theorem 4.1 we assume w.l.o.g. . Note that we can assume , for otherwise it suffices to use the 2-approximation of Caragiannis et al. . We also assume , for otherwise it is a straightforward exercise to find an optimal solution in linear time. Let us define a linear program (13–16):
The linear program (13–16) is a relaxation of the natural integer program for Minimax Approval Voting, obtained by replacing (16) by the discrete constraint . Indeed, observe that corresponds to the -th letter of the solution , (14) states that , and (15) states that .
Our algorithm is as follows (see Pseudocode 2). First we solve the linear program in time using the interior point method . Let be the obtained optimal solution. Clearly, . We randomly construct a string , guided by the values . More precisely, for every independently, we set with probabability . Note that needs not contain ones. Let by any -completion of . The algorithm returns .
Clearly, the above algorithm runs in polynomial time. In what follows we bound the probability of error. To this end we prove upper bounds on the probabability that is far from and the probabability that the number of ones in is far from . This is done in Lemmas 5.2 and 5.3.
we define a random variablethat measures the distance between and
Note that are independent 0-1 random variables.
Using linearity of the expectation we obtain
Note that is a sum of independent 0-1 random variables when and otherwise. Denote . We apply Chernoff bounds. For we have
In case we proceed analogously, using the Chernoff bound (3)
Now we use the union bound to get the claim.