Mixed Integer Programming with Convex/Concave Constraints: Fixed-Parameter Tractability and Applications to Multicovering and Voting

09/08/2017 ∙ by Robert Bredereck, et al. ∙ 0

A classic result of Lenstra [Math. Oper. Res. 1983] says that an integer linear program can be solved in fixed-parameter tractable (FPT) time for the parameter being the number of variables. We extend this result by incorporating non-decreasing piecewise linear convex or concave functions to our (mixed) integer programs. This general technique allows us to establish parameterized complexity of a number of classic computational problems. In particular, we prove that Weighted Set Multicover is in FPT when parameterized by the number of elements to cover, and that there exists an FPT-time approximation scheme for Multiset Multicover for the same parameter. Further, we use our general technique to prove that a number of problems from computational social choice (e.g., problems related to bribery and control in elections) are in FPT when parameterized by the number of candidates. For bribery, this resolves a nearly 10-year old family of open problems, and for weighted electoral control of Approval voting, this improves some previously known XP-memberships to FPT-memberships.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A computational problem is parameterized if a certain feature of its input is distinguished as the parameter (e.g., the parameter can be the bound on the size of a certain part of the input). A parameterized problem is fixed-parameter tractable (in , in short) if there exists an algorithm that, given an instance  with parameter , can compute the answer to this problem in time, where is a computable function which depends only on the parameter and is the length of the encoding of . Lenstra’s [26] famous result says that integer linear programs (ILPs) can be solved in time with respect to the number of integer variables. This result gives a very powerful tool for proving that certain problems are fixed-parameter tractable. We show that we can replace certain integer variables by their “non-decreasing piecewise linear convex or concave transformations”, and still the corresponding programs can be solved in time. This is the first main contribution of this paper; we then argue that in many cases this result gives a more convenient tool than Lenstra’s original theorem.

Specifically, the following technique is applied many times in the study of parameterized complexity: formulate a given problem as an integer linear program and check whether the number of variables is upper bounded by a function which depends only on the parameter at hand; if it is so, apply Lenstra’s algorithm and get fixed-parameter tractability. One problem with this technique is that sometimes there are non linear constraints; in this work we show that if these constraints are piece-wise linear (or can be made such), then one can use our general technique (as a black-box) and get fixed-parameter tractability as well. In this sense, we strengthen Lenstra’s result by providing a more powerful general technique (which of course builds upon Lenstra’s result).

Our technique can be applied to a broad class of computational problems. In particular, we demonstrate its applications through a family of classic covering problems. We first consider Weighted Set Multicover, where the input is given by a set of elements , a multiset of sets over , integer weights for the sets, integer covering requirements for the elements of , and a budget . The question is whether it is possible to pick a collection of sets from , with their total weight not exceeding , so that each element belongs to at least selected subsets.

Notice that, while a straightforward formulation of Set Multicover (without the weights) as a linear integer problem (in short, create a variable for each element and for each set, utilizing the fact that the number of sets is upper bounded by , where is the number of elements), introducing the weights in Weighted Set Multicover break this formulation since some constraints become non-linear. We demonstrate that a fairly straightforward application of our general result leads to an FPT algorithm for Weighted Set Multicover with respect to the number  of elements. By allowing convex transformations of integer variables in an ILP, we are able to effectively handle the weights of the subsets from . Intuitively, we construct a function that given a certain number of subsets belonging to the “same class”, returns the minimal total weight of the “best” subsets from this class. This function is convex and it is used to enforce the budget constraint.

Second, we consider Multiset Multicover, a variant of Weighted Set Multicover where the weights of the elements are all equal, but where can contain multisets over  rather than simply sets. We show that our general technique can be used as a brick in the construction of an FPT-time approximation scheme for Multiset Multicover for the parameter being the number of elements (there is no hope for an FPT exact algorithm for this problem, since a simple reduction from the Subset Sum problem proves that Multiset Multicover is -hard even for ). In this case, our analysis is technically involved: it combines certain combinatorial arguments with the previous technique of solving ILPs with concave transformations. This is the second main contribution of this paper.

Third, we demonstrate applications of our results to certain problems from computational social choice. In particular, our general result allows us to resolve the computational complexity status of a number of election problems parameterized by the number of candidates, for the case where voters are unweighted but have prices. These resolved problems include, for example, various bribery problems [11, 12] and priced control problems [27] that were known to be in  for nearly 10 years, but were neither known to be fixed-parameter tractable (in ), nor to be -hard111 A parameterized problem is in if there exists an algorithm that, given an instance  with parameter , can compute the answer to this problem in time , where is some computable function. In other words, is the class of those problems that can be solved in polynomial time under the assumption that the parameter is a constant. In contrast, problems which are -hard even for constant values of the parameter are said to be -hard with respect to the parameter. For further information, we point the readers to textbooks on parameterized complexity theory [8, 10, 17, 28].. Our technique also applies to weighted voter control for Approval voting, improving results of Faliszewski et al. [14], and to problems pertaining to finding winners according to several multiwinner election rules, as discussed by Faliszewski et al. [15] and Peters [29].

Our technique was already used in other contexts, for example to prove fixed-parameter tractability of several problems in multiagent scheduling [19]. Specifically, in the scheduling problems studied by Hermelin et al. [19] there are weights for the jobs or different processing times for different jobs, thus formulating the problems as integer linear programs causes non-linear constraints; it turns out that our general technique helps in proving fixed-parameter tractability for these problems as well.

Related Work

Integer Programming.

Lenstra Jr [26] showed that Mixed Integer Linear Programming is fixed-parameter tractable with respect to the number  of variables. Frank and Tardos [18] and Kannan [21] improved the corresponding running time bounds.

Covering.

The class of covering problems is a fundamental class of computational problems. The Set Cover problem, which was one of the problems studied by Karp’s seminal paper [22], can be arguably considered as the representative problem in this class. It is thus known that Set Cover is NP-hard. Further, it is W[2]-hard [10] and arguably the representative problem in this hardness class as well. On the positive side, it is known that Set Cover is fixed-parameter tractable with respect to the number of elements. Covering problems are important class of optimization problems and have various applications in domains such as software engineering (e.g., covering scenarios by few test cases), antivirus development (looking for a set of suspicious byte strings which covers all known viruses), databases (finding a set of labels which covers all data items), to name just a few.

There is a vast literature on the Set Cover problem, thus we only briefly point out selected results. It is known that a simple greedy algorihtm gives a approximation guarantee, where is the number of sets in the solution (see e.g., the textbook of Vazirani [34]) and that unless , no polynomial-time algorithm can approximate the problem with a better ratio [16]. The variant of the problem where each element appears in at most sets can be approximated with the ratio  [34]. The parameterized approximation algorithms for the problem were considered Bonnet et al. [2], Skowron and Faliszewski [33] and Skworon [32]. Rajagopalan and Vazirani studied approximability of multi-set multi-cover problems [30]. The exact algorithms for this problem were studied by Hua et al. [20]. Approximation algorithms for covering integer programs were considered by Kolliopoulos [23] and Kolliopoulos and Young [24].

Voting.

We refer to the books of Rothe [31] and Brandt et al. [3] for a general account on voting problems. Algorithmic problems that model the manipulation of elections include, among others, strategic voting problems (where we are given an election with honest voters and we ask whether a group of manipulators can cast votes to ensure their preferred candidate’s victory), election control problems (where we are given an election and ask if we can ensure a given candidate’s victory by adding/deleting candidates/voters), or bribery and campaign management problems (where we want to ensure a given candidate’s victory by changing some of the votes, but where each vote change comes at a price and we are bound by a budget). We focus on the case where we have a few candidates but (possibly) many voters. This is a very natural setting and it models many real-life scenarios such as political elections or elections among company stockholders.

The complexity of manipulating elections with few candidates is, by now, very well understood. On the one hand, if the elections are weighted (as is the case for the elections held by company stockholders), then our problems are typically -hard even if the number of candidates is a small fixed constant [7, 12, 14]; these results typically follow by reductions from the well-known -hard Partition problem. One particular example where we do not have -hardness is control by adding/deleting voters under the Approval and -Approval voting rules. Faliszewski at al. [14] have shown that these problems are in , that is, that they can be solved in polynomial time if the number of candidates is assumed to be a constant. On the other hand, if the elections are unweighted (as is the case for political elections) and no prices are involved, then we typically get results. These results are often obtained by expressing the respective problems as integer linear programs (ILPs) and then applying Lenstra’s algorithm [26]. For example, for control by adding voters we can have a program with a separate integer variable for each possible preference and count how many voters with each preference we add [13] (the constraints ensure that we do not add more voters with a given preference than are available and that the desired candidate becomes a winner). Since the number of different preferences is a function depending only on the number of candidates, we can solve such an ILP using Lenstra’s algorithm in time. Typically, this approach does not work for weighted elections as weights give voters a form of “identity”: e.g., in the control example, it no longer suffices to specify how many voters to add; we need to know exactly which ones to add (the trick in showing -membership for weighted voter control under Approval is to see that, for each possible voter preference, we add only the heaviest voters with this preference [14]).

The main missing piece in our understanding of the complexity of manipulating elections with few candidates regards those unweighted-election problems where each voter has some sort of price (for example, as in the bribery problems). In this paper we almost completely fill this gap by showing a general approach for proving membership for a class of bribery-like problems parameterized by the number of candidates, for unweighted elections222One problem for which our technique does not apply is Swap Bribery [11]; even though Dorn and Schlotter [9] claim that it is in when parameterized by the number of candidates, their proof applies only to uniform price functions. Fixed-parameter tractability of Swap Bribery with arbitrary price functions has been shown very recently [25]. (as a side effect, we also get membership for weighted control under the Approval and -Approval rules).

2 MIP with Piecewise Linear Convex/Concave Functions

To show fixed-parameter tractability, integer linear programming has become a powerful tool. This is due to a famous result by Lenstra Jr [26], which was later improved by Frank and Tardos [18] and Kannan [21]. Often, the key achievement of the mentioned results is read as “integer linear programming is fixed-parameter tractable with respect to the number of variables”. Formally they considered the following decision problem.

Mixed Integer Linear Programming (MIP)
Input: An matrix  with integer elements and a length-

integer vector 

.
Question: Is there a length  vector  such that , for , and for ?

We interpret the entries of  as variables and the rows as constraints and use the standard syntax from linear programming. The following result is due to Frank and Tardos [18] and Kannan [21], who improved the running time of Lenstra’s original algorithm [26].

Theorem 1 ([26, 18, 21]).

Mixed Integer Linear Programming can be solved using arithmetic operations where  is the number of bits encoding input and is the number of integer variables.

To extend this fixed-parameter tractability result to integer linear programming formulations, one has to take care of the additional objective function on . In most cases, we can simply assume some upper or lower bound for the objective value (which is often explicitly given in the decision variant of the problem) and replace the objective function by an additional constraint. If one only has to consider a bounded number  of possible values for the objective function, then one can even simulate the minimization or maximization process by decreasing or increasing the bound in a binary-search manner. This gives an additional factor  on the running time bound.

In this section, we describe a new general technique which allows to design

algorithms for a wide class of optimization problems. In the most general framework, our problems can be viewed as relaxations of integer linear programs. Technically, our result gives a convenient way of using Lenstra’s famous algorithm—we show that an ILP can be relaxed by admitting convex or concave, piecewise linear transformations of integer variables, and that such relaxed programs can still be solved in

time. This general result will be used throughout the paper to derive a number of results for more specific types of problems.

2.1 Piecewise Linear Convex/Concave Functions

We consider two simple classes of piecewise linear functions: piecewise linear convex functions and piecewise linear concave functions. These are continuous convex (resp. concave) functions defined on the set of real numbers, which have their graphs composed of selections of straight-lines. An example from each class is illustrated in Figure 1. For a piecewise linear convex function we can decompose its domain into a minimal number of disjoint intervals:

such that truncated to each of these intervals is a linear function. In such case we say that consists of pieces. We refer to the function  truncated to interval  as piece zero of ; to the function  truncated to interval as to the first piece of ; to the function  truncated to interval as to the second piece of ; etc. By we denote the derivative of the -th piece of .

Figure 1: An example of a piecewise linear convex function (left plot) and a piecewise linear concave function (right plot) with three pieces.

2.2 Fixed-Parameter Tractability of Relaxed Integer Programs

The following central problem of our framework is a relaxation of Mixed Integer Programming.

Mixed Integer Programming with Simple Piecewise Linear Transformations (MIPwSPLT)
Input: A collection of piecewise linear convex functions , a collection of piecewise linear concave functions , and a vector .
Question: Is there a vector  such that

Our theorem generalizes Lenstra’s result for mixed integer linear programming [26, Section 5].

Theorem 2.

Mixed Integer Programming with Simple Piecewise Linear Transformations can be solved in time where  is the number of integer variables,  is the maximum number of pieces per function, and  is the number of bits encoding the input.

Proof.

To prove the theorem, we will reduce MIPwSPLT to MIP. To this end, we show how to replace each non-linear constraint by a polynomial number of linear constrains leading to an instance whose feasible solutions can be directly translated back into feasible solutions of the original problem. The corresponding mixed integer program will have integer variables and a certain number of rational variables which is bounded by a polynomial function of the size of the input. This will allow us to use Lenstra’s result in its variant for mixed integer programming (see [26, Section 5]), which says that mixed integer programs can be solved in time for the parameter being the number of integer variables.

Let us describe the construction of our mixed integer program. Let and denote the number of linear pieces of functions and , respectively. Recall that we consider the canonical form where all variables are nonnegative. Hence, we can assume that the zero-index piece of every function covers point  (pieces covering only negative points are irrelevant). Furthermore, by appropriately setting the  coefficients, we can also assume that (resp. ).

We start with a copy of the MIPwSPLT instance and successively transform it into an ordinary MIP instance. To this end, we keep all integer variables and all real-valued variables of the original MIPwSPLT instance. Next, we describe how to replace the nonlinear constraints, introduce the necessary (additional) real-valued variables, and discuss the correctness of these replacements.

Replacing Nonlinear Constraints

We first describe how to replace some constraint that uses piecewise linear convex (resp. concave) transformations by at most additional constraints and variables.

Additional Variables.

For each we introduce two real-valued variables:  and ; intuitively, if there exists a feasible solution to the original MIPwSPLT instance, then there exists a feasible solution where is equal to and where is equal to . Finally, for each we introduce real-valued variables and real-valued variables ; intuitively, there is always a feasible solution where and , thus (resp. ) measures by how much the th piece of function  (resp. ) intersects with the interval .

Constraints (Convex Case).

In what follows, we assume that the th constraint uses a nontrivial (that is, not simply linear) piecewise linear convex transformation  on at least one variable . First, for each variable we introduce two constraints:

(1)

Second, for each we introduce the constraint:

(2)
Constraints (Concave Case).

Analogously to the convex case, we assume that the th constraint uses a nontrivial (that is, not simply linear) piecewise linear concave transformation  on at least one variable . Again, we first introduce two constraints for each variable (these two constraints are almost identical to the convex case):

(3)

Second, for each we introduce the constraint:

(4)

Finally, we replace the th original constraint by:

(5)
Correctness (Convex Case).

In order to satisfy Constraint (5), smaller values of are clearly more desirable. Since each  only occurs once (on the right-hand side) in Constraint (5) and once (on the left-hand side) in one constraint from Constraint Set (2), we can infer that each constraint from Constraint Set (2) can be satisfied with equality. Further, together with the fact that is convex, and consequently for each , we infer that the values can be as small as possible. Formally, similarly as above, we infer from Constraint Set (1) that if there exists a feasible solution to our program, then there exists a feasible solution where for each variable  it holds that . Consequently, we conclude that:

Let us focus now on the above equality. If , then we surely have . Next, we analyze how the value of changes when we increase  by . If does not exceed , then increases by . If is greater than but smaller than , then increases by , and so on. Thus, we see that .

Correctness (Concave Case).

In order to satisfy Constraint (5), larger values of are clearly more desirable. Since each  only occurs once (on the left-hand side) in Constraint (5) and once (on the right-hand side) in one constraint from Constraint Set (4), we can infer that each constraint from Constraint Set (4) can be satisfied with equality. Further, together with the fact that is concave, and consequently for each , we infer that the values can be as small as possible. Formally, similarly as above, we infer from Constraint Set (3) that there exists a feasible solution where for each variable  it holds that . Consequently, we conclude that:

Let us focus now on the above equality. If , then we obviously have . Next, we analyze how the value of changes when we increase  by . If does not exceed , then increases by . If is greater than but smaller than , then increases by , and so on. Thus, we see that .

Summarizing, we successively replaced each non-linear constraint by a set of equivalent linear constraints, so any feasible solution for our constructed MIP instance immediately gives us a feasible solution for the original MIPwSPLT instance.

The corresponding running-time upper-bound is where  denotes the number of integer variables and  is the number of bits needed to encode our MIP [26] (see also the works of Frank and Tardos [18] and Kannan [21] for improvements in the running time upper bounds). Finally, can be upper-bounded by since we introduced at most additional constraints and variables. This completes the proof. ∎

We conclude this section with two observations regarding the generality of Theorem 2. First, observe that in this section we used the canonical form of MIPwSPLT, requiring all the variables to be non-negative. Yet, as long as we do not actually use the piecewise linear transformations on a variable  (that is, as long as for each functions and are linear) we can use the standard technique of replacing each occurrence of  with , where and are two nonnegative variables denoting, respectively, the positive and the negative part of . In other words, we may allow negative values for each variable  whose associated functions  and  are simply linear.

Second, observe that, similarly to standard MIPs, objective functions can be simulated by an additional constraint on the expression to be optimized, and by decreasing or increasing the bound in such constraint in a binary-search manner. It is also possible to minimize (respectively, maximize) objective functions using piecewise linear concave (respectively, convex) transformations.

3 Covering and Voting: Showcases of the General Technique

In this section we demonstrate how to apply our technique introduced in Section 2 to two classes of problems: (i) generalizations of the Max Cover problem, and (ii) selected bribery and control problems for approval voting.

3.1 Weighted Multiset Multicover with Small Universe

We start by focusing on the complexity of a few generalizations of the Max Cover problem. As many of the studied covering problems consider multisets, the following notation will be useful. If is a multiset and is some element, then we write to denote the number of times occurs in (that is, is ’s multiplicity in ). If is not a member of , then .

Definition 1.

In the Weighted Multiset Multicover (WMM) problem we are given a multiset of multisets over the universe , integer weights for the multisets, integer covering requirements for the elements of the universe, and an integer budget . We ask whether there exists a subfamily  of multisets from such that:

  1. for each it holds that (that is, each element  is covered by at least the required number of times), and

  2. (the budget is not exceeded).

By a straightforward polynomial-time reduction from Partition, we observe that WMM is -hard even for the case of a single-element universe. Clearly, this also means that the problem is -hard with respect to the number of elements in the universe.

Proposition 1.

WMM is -complete even for the case of a single-element universe.

Proof.

Membership in is clear. We show -hardness by a reduction from the Partition problem. An instance of Partition consists of a sequence of nonnegative integers . We ask if there is a set such that .

We form an instance of Weighted Multiset Multicover as follows. The universe contains a single element with covering requirement equals to . For each , , there a single multiset containing occurrences of , with weight . We set the budget to be . Clearly, it is possible to cover  sufficiently many times if and only if our input instance of Partition is a “yes”-instance. ∎

Another variant of WMM is Multiset Multicover, where we assume each set to have unit weight. By generalizing the proof for Proposition 1, we show that this problem is -hard already for two-element universes, which again implies -hardness with respect to the number of elements in the universe.

Proposition 2.

Multiset Multicover is -complete even for universes of size two.

Proof.

Membership in is clear. To show -hardness, we give a reduction from a variant of the Subset Sum problem. We are given a sequence of positive integers, a target value , and we ask if there is a set such that (a) , and (b) .

Let be . We form an instance of Multiset Multicover that contains two elements, and . For each , , we form a set that contains with multiplicity , and with multiplicity . We set the covering requirement of to be , and the covering requirement of to be . We ask if there is a multiset multicover of size exactly .

Clearly, if there is a solution for our Subset-Sum instance, then the sets that correspond to this solution form a multiset multicover of and . On the contrary, assume that there is a collection of at most sets that form a multiset multicover of and . There must be exactly  of these sets. Otherwise, the sum of their multiplicities for  would be different from . Due to the covering requirement of , these sets correspond to the numbers from that sum up to at least , and due to covering requirement of , these sets correspond to numbers that sum up to at most . This completes the proof. ∎

Often we do not need the full flexibility of WMM—for instance, in the next section we will describe several problems from computational social choice that can be reduced to more specific variants of WMM. In particular, we will demonstrate a few examples where it suffices to use Weighted Set Multicover, a variant of WMM where each input multiset has multiplicity 0 or 1 (in other words, the family contains sets without multiplicities, but the union operation takes multiplicities into account). We will also use a restricted variant of Multiset Multicover, where for each multiset in the input instance there is a number such that for each element we have (in other words, elements within a single multiset have the same multiplicity). We refer to this variant of the problem as Uniform Multiset Multicover.

As a first application of our new framework, we show that Weighted Set Multicover is fixed-parameter tractable when parameterized by the universe size. Notably, we only use convex constraints in the constructs MIPwSPLT instance.

Theorem 3.

Weighted Set Multicover is fixed-parameter tractable when parameterized by the universe size.

Proof.

Consider an instance of Weighted Set Multicover with universe , family of subsets, weights for the sets, covering requirements for the elements, and budget . Our algorithm proceeds by solving an appropriate piecewise linear integer program.

First, we form a family of all subsets of . For each , , let . For each  and , we define a piecewise linear convex function so that for each , is the sum of the lowest weights of the sets from .

We have integer variables , . Intuitively, these variables describe how many sets we take from each type (i.e., how many sets we take from each family ).

We introduce the following constraints. For each , , we have constraints and . For each element of the universe, we also have constraint . These constraints ensure that the variables describe a possible solution for the problem (disregarding the budget). Our final constraint uses variables to express the requirement that the solution has cost at most :

Finally, we use Theorem 2 to get the statement of the theorem. ∎

As a second application of our new framework, we show that Uniform Multiset Multicover is fixed-parameter tractable when parameterized by the universe size. Notably, we only use concave constraints in the corresponding MIPwSPLT.

Theorem 4.

Uniform Multiset Multicover is fixed-parameter tractable when parameterized by the universe size.

Proof.

Consider an instance of Uniform Multiset Multicover with universe , family of subsets, covering requirements for the elements, and budget . Our algorithm proceeds by solving an appropriate piecewise linear integer program.

Similarly as for Theorem 3, we form a family of all the subsets of (note that these, indeed, are subsets and not multisets). For each , , let be a subfamily of that contains those multisets in which exactly the elements from appear (that is, their multiplicities are non-zero). For each , , we define a piecewise linear concave function so that for each , , denotes the maximum sum of multiplicities for each element from  using  multisets from . (To compute this function, we simple need to sort the multisets from  decreasing with respect to multiplicities. Then, is the sum of the multiplicities with respect to an arbitrary element from  of the first  multisets.)

We have integer variables , . Intuitively, the  variables describe how many multisets we take from each type. Thus,  describes how much each element from  is covered by taking  multisets of type .

We introduce the following constraints. For each , , we have constraints . For each element of the universe, we also have constraint . These constraints ensure that the variables describe a possible solution for the problem (disregarding the budget). To express the requirement that the solution has cost at most , we add the constraint .

Finally, we use Theorem 2 to get the statement of the theorem. ∎

Unfortunately, it is impossible to apply our approach to the more general Multiset Multicover; by Proposition 2, Multiset Multicover is already -hard for two-element universes. It is, however, possible to obtain a certain form of an approximation scheme.

Definition 2.

Let be a real number, . We say that algorithm is an -almost-cover algorithm for Multiset Multicover if, given an input instance with universe and covering requirements , it outputs a solution that covers each element  with multiplicity such that .

In other words, on the average an -almost-cover algorithm can miss each element of the universe by an -fraction of its covering requirement. For the case where we really need to cover all the elements perfectly, we might first run an -almost-cover algorithm and then complement its solution, for example, in some greedy way, since the remaining instance might be much easier to solve.

The key idea regarding computing an -almost-cover is that it suffices to replace each input multiset by several sub-multisets, each with a particular “precision level,” so that multiplicities of the elements in each sub-multiset are of a similar order of magnitude. The full argument, however, is the most technical part of our paper.

Theorem 5.

For every rational , there is an -time -almost-cover algorithm for Multiset Multicover parameterized by the universe size.

Proof.

Throughout this proof we describe our -almost-cover algorithm for Multiset Multicover. We consider an instance of Multiset Multicover with a family of multisets over the universe , where the covering requirements for the elements of the universe are . We associate each set from the family with the vector of element multiplicities.

Let be the desired approximation ratio. We fix and . Notice that and . Let and let be a sequence of all -dimensional vectors whose entries come from the -element set . For each , , we write . Intuitively, these vectors describe some subset of “shapes” of all possible multisets—interpreted as vectors of multiplicities—over our -element universe. For each number , we write to mean the vector .

Intuitively, vectors of the form  are approximations of those multisets for which the positive multiplicities of the elements do not differ too much (formally, for those multisets for which the positive multiplicities differ by at most a factor of ). Indeed, for each such multiset , we can find a value and a vector such that for each element it holds that . However, this way we cannot easily approximate those sets for which multiplicities differ by a large factor. For example, consider a set represented through the vector , where , in particular if . For each value and each vector , the vector will be inaccurate with respect to the multiplicity of element or inaccurate with respect to the multiplicity of element (or inaccurate with respect to both these multiplicities).

The main step of our algorithm is to modify the instance so that we replace each multiset  from the family with a sequence of vectors of the form that altogether add to at most the multiset  (each such sequence can contain multiple vectors of different “shapes” and of different scaling factors ). The goal is to obtain an instance that on the one hand consists of “nicely-structured” sets (vectors) only, and on the other hand has the following property: if in the initial instance there exist sets that cover elements with multiplicities , then in the new instance there exist sets that cover elements with multiplicities , such that . We refer to this as the almost-cover approximation property.

The procedure for replacing a given set is presented as Algorithm 1. This algorithm calls the Emit function with arguments for each vector that it wants to output ( is always one of the vectors ). The emitted sets replace the set from the input. Below we show that if we apply Algorithm 1 to each set from , then the resulting instance has our almost-cover approximation property.

1 Main():
2        multip ;
        sorted sort() ;
         // sort in ascending order of multiplicities
3        ;
        // refers to the ’th item’s number
        // refers to its multiplicity
4        while  do
5               ;
6              
7       Main_Rec(, ) ;
8       
9      
10 Main_Rec(, ):
11        (vector of zeros). ;
12        multip[].second ;
13        [multip[].first] 1 ;
14        ;
15        while  do
16               if  then
17                      [multip[].first] ;
18                      ;
19                     
20              else
21                      for  to  do
22                             [multip[].first] ;
23                            
24                     Round_And_Emit(, ) ;
25                      for  to  do
26                             ;
27                            
28                     Main_Rec(, ) ;
29                      return
30       Round_And_Emit(, );
31       
32      
33 Round_And_Emit(, ):
34        for  to  do
35               ;
36              
37       Emit(, );
38       
Algorithm 1 The transformation algorithm used in the proof of Theorem 5—the algorithm replaces a given set with a sequence of vectors of the form .

Let us consider how Algorithm 1 proceeds on a given set . For the sake of clarity, let us assume there is no rounding performed by Algorithm 1 in function Round_And_Emit (the loop in line 1). We will come back to this issue later.

The algorithm considers the elements of the universe—indexed by variable throughout the algorithm—in the order given by the vector “sorted” (formed in line 1 of Algorithm 1). Let be the order in which Algorithm 1 considers the elements (so means that is considered before ), and let be the elements from the universe renamed so that . Let be the number of sets that Algorithm 1 emits on our input set and let these sets be . (This is depicted on Figure 2, where for the sake of the example we take and .)

Consider the situation where the algorithm emits the ’th set, , and let be the value of variable right before the call to Round_And_Emit that caused to be emitted. Note that each element from the universe such that has the same multiplicity in as element (line 1 of Algorithm 1). Let be the sum of the multiplicities of the elements from . We make the following observations:

Observation 1:

.

Observation 2:

It holds that .

Observation 3:

We have that . Further, we have that .

Observation 4:

For it holds that .

Figure 2: An example for Algorithm 1: The algorithm replaces with sets , , and .

Now let us consider some solution for instance that consists of sets, . These sets, altogether, cover all the elements from the universe with required multiplicities, that is, it holds that for each we have . For each set and for each element from the universe, we pick an arbitrary number so that altogether the following conditions hold:

  1. For every set and every , .

  2. For every , .

Intuitively, for a given set , the values describe the multiplicities of the elements from  that are actually used to cover the elements. Based on these numbers, we will show how to replace each set from with one of the sets emitted for it, so that the resulting family of sets has the almost-cover approximation property.

Figure 3: The cases in the proof of Theorem 5. The bullets represent values .

Consider a set for which Algorithm 1 emits sets, . As in the discussion of Algorithm 1, let be the elements from the universe in which Algorithm 1 considers them (when emitting sets for ). We write to mean the value such that . Let , let , and let be the set from defined in the following way:

  1. If for every set we have , then is the set with the greatest value (the set that covers element with the greatest multiplicity). This is the case denoted as “Case (c)” in Figure 3.

  2. Otherwise is the set that has the lowest value , yet no-lower than . This is the case denoted as either “Case (a)” or “Case (b)” in Figure