1 Introduction
Set cover is an extensively studied problem in combinatorial optimization. In this paper, we study a variant of the set cover problem, namely the
partial set multicover problem, which is defined as follows.Definition 1.1 (Partial Set MultiCover (PSMC)).
Suppose is an element set, is a collection of subsets of , each set has a cost , each element has a positive covering requirement . For a subcollection , denote by those sets of containing element . If , we say that is fully covered by , denoted as . The cost of subcollection is . Given with and a real number which is a constant between 0 and 1, the PSMC problem is to find a minimum cost subcollection such that . An instance of PSMC is denoted as .
The PSMC problem includes two important variants of the set cover problem. When , it is the partial set cover problem (PSC). When , it is the set multicover problem (SMC). One motivation of PSC comes from the phenomenon that in a real world, “satisfying all requirements” will be too costly or even impossible, due to resource limitation or political policy [4]. And SMC comes from the requirement of fault tolerance in practice [28]. There are a lot of researches on PSC and SMC, achieving performance ratios matching the lower bounds for the classic set cover problem, namely and , where is the number of elements and is the maximum number of sets containing a common element. However, study on the combination of these two problems is very rare. According to our recent paper [25], under the ETH assumption, the PSMC problem cannot be approximated within factor for some constant .
The aim of this paper is to explore a greedy strategy on PSMC.
1.1 Related Work
The set cover problem (SC) was one of the first 21 problems proved to be NPhard in Karp’s seminal paper [16]. In fact, Feige [11] proved that it cannot be approximated within factor unless , where is the number of elements. Dinur and Steurer [9] proved the same lower bound under the assumption that . Khot and Regev [18] showed that it cannot be approximated within factor for any constant assuming that unique games conjecture is true, where is the maximum number of sets containing a common element. On the other hand, greedy strategy achieves performance ratio [7, 15, 20], where is the maximum cardinality of a set and is the Harmonic number. And approximation exists by either LP rounding method [13] or local ratio method [3].
In paper [10], Dobson first gave an approximation algorithm for multiset multicover problem (MSMC), where is the maximum size of a multiset. Rajagopalan and Vazirani [22]
gave a greedy algorithm achieving the same performance ratio, using dual fitting analysis, which implies that the integrality gap of the classic linear program of MSMC is at most
.For the partial set cover problem, Kearns [17] gave a greedy algorithm achieving performance ratio . By modifying the greedy algorithm a little, Slavik [27] improved the performance ratio to , where is the percentage that elements are required to be covered. Gandhi et al. [12] proposed a primaldual algorithm achieving performance ratio . BarYuhuda [2] studied a generalized version in which each element has a profit and the total profit of covered elements should exceed a threshold. Using local ratio method, he also obtained performance ratio . Konemann et al. [19] presented a Lagrangian relaxation framework and obtained performance ratio for the generalized partial set cover problem.
From the above related work, it can be seen that both PSC and SMC admit performance ratios matching those best ratios for the classic set cover problem. However, combining partial set cover with set multicover has enormously increased the difficulty of studies. Ran et al. [23] were the first to study approximation algorithms for PSMC, using greedy strategy and dualfitting analysis. However, their ratio is meaningful only when the covering percentage is very close to . In paper [24], the authors presented a simple greedy algorithm achieving performance ratio . They also presented a local ratio algorithm, which reveals a what they called “shock wave” phenomenon: their performance ratio is for both PSC and SMC , however, when is smaller than 1 by a very small constant, the ratio jumps abruptly to . In our recent paper [25], we proved that PSMC cannot have a better than polynomial performance ratio by a reduction from the wellknown densest subgraph problem.
1.2 Our Contribution and Techniques
The contributions of this paper is summarized as follows.

A new problem called minimum density subcollection (MDSC) is defined, which is to find a subcollection to minimize the ratio , where is the set of elements fully covered by . We prove that MDSC is also NPhard.

We show that if MDSC has an approximation algorithm, then PSMC has an bicriteria approximation algorithm, that is, the output of our algorithm has cost at most times that of an optimal solution, while the total number of fully covered elements is at least , where is the covering ratio required by the problem and is the total number of elements.

We design an approximation algorithm for MDSC, where is the maximum covering requirement of elements. Combining this result with the above, PSMC has an bicriteria algorithm.
Our algorithm uses a greedy strategy. However, there is a problem of which sets should be chosen in each iteration. As indicated by previous studies in paper [23], a natural generalization of the classic greedy algorithm cannot yield good results. One reason might be that the number of elements fully covered by a subcollection of sets is not submodular. In this paper, our greedy algorithm iteratively picks an approximate solution to the MDSC problem until the number of elements which are fully covered reaches a certain degree. An obstacle to obtaining a good approximation factor lies in the last iteration: the subcollection chosen in the last iteration might cover much more elements than required. Although its density is low, its cost might be too large to be bounded by the optimal value. So, we stop the algorithm when at least elements are fully covered, and thus leading to a bicriteria approximation algorithm.
A crucial steppingstone to the above algorithm is the MDSC problem, which is also NPhard. An example can be constructed showing that a natural LP formulation has integrality gap arbitrarily large. To overcome such a difficulty, we formulate the problem as a linear program using a language having a taste of “flow” and made use of an approximation algorithm for the minimum nodeweighted Steiner network problem as a subroutine to yield a performance guaranteed approximation algorithm for MDSC.
The paper is organized as follows. In Section 2, we give the definition of MDSC and prove its NPhardness. In Section 3, we show how an approximation algorithm for MDSC leads to an bicriteria algorithm for PSMC. In Section 4, we propose an approximation algorithm for MDSC. In Section 5, the paper is concluded with some discussions on future work.
A preliminary version of this paper was presented in INFOCOM2017. There is a flaw in that version. We explain in the appendix where is the flaw.
2 Preliminaries
For simplicity of statement, we shall use to denote the set of elements fully covered by subcollection . Define the density of a subcollection as
Definition 2.1 (Minimum Density SubCollection (MDSC)).
Given , the MDSC problem is to find a subcollection with the minimum density.
Unfortunately, MDSC is also NPhard.
Theorem 2.2.
The MDSC problem is NPhard.
Proof.
We reduce the perfect 3dimensional matching problem (which is APXhard [1]) to MDSC. Given an integer , three sets each having cardinality , and a set , the perfect 3dimensional matching problem asks whether there is a subset with such that for any elements , , , and . Construct an instance of MDSC as follows. Let and . The covering requirement and for . The cost for all .
Next, we show that there is a perfect 3dimensional matching if and only if the optimal value for the MDSC problem is . In fact, if is a perfect 3dimensional matching, then has and . Suppose the instance does not have a perfect 3dimensional matching. Consider an arbitrary subcollection and its corresponding subset . Then . If , then . If , then and thus . In this case, . The claimed result is proved. ∎
3 Bicriteria Algorithm for PSMC
In this section, we make use of an approximation algorithm for MDSC to design a bicriteria algorithm for PSMC.
3.1 The Algorithm
The algorithm is presented in Algorithm 1. It follows the classic greedy strategy. A main difference is that instead of choosing sets one by one, in each iteration, it implements an approximation algorithm for MDSC to greedily choose subcollections. After each iteration, the instance is updated with respect to the current subcollection to form a reduced instance , where is the set of elements not having been fully covered, the total remaining covering ratio
(1) 
the remaining covering requirement for element is , and those elements which have been fully covered by have to be removed from each set. In the following, when we mention a reduced instance or when we say that the instance is updated, it is always understood that the above operations are executed. When the algorithm terminates, we have , and thus the number of fully covered elements is at least by the expression of reduced covering ratio defined in (1).
3.2 Performance Ratio Analysis
Suppose Algorithm 1 is executed times, selecting subcollections
. We estimate costs
and separately. In the following, denotes an optimal solution to PSMC, and is the optimal cost.Lemma 3.1.
Proof.
For , denote by the number of elements remaining to be fully covered after is selected. Then for where
(2)  
(3) 
After the th iteration, is a subcollection fulfilling the remaining covering requirement . So the density of an optimal solution to the MDSC problem in the th iteration is upper bounded by . Since approximates the density of within a factor of , we have
(4) 
Combining this with inequalities (2), (3) we have
The lemma is proved. ∎
Lemma 3.2.
.
Proof.
Theorem 3.3.
Implementing an approximation algorithm for MDSC, the PSMC problem admits an bicriteria approximation.
For small , the performance ratio in the above theorem can be viewed as since is a constant.
4 Approximation Algorithm for MDSC
In this section, we present an approximation algorithm for MDSC. The algorithm is based on an LP formulation and makes use of a node weighted Steiner network algorithm.
4.1 LPFormulation
The following is a natural formulation of integer program for MDSC.
(5)  
Here indicates whether set is selected and indicates whether element is fully covered. The first constrained says that if then at least sets containing must be selected and thus is fully covered. Relaxing (4.1) and by a scaling, we have the following linear program:
(6)  
However, the following example shows that the integrality gap between (4.1) and (4.1) can be arbitrarily large.
Example 4.1.
Hence, to obtain a good approximation, we need to find another program. In the following, we formulate the problem in an more involved flowlike language. For an element , an coverset is a subcollection with which fully covers . Denote by the family of all coversets, and . Consider the following example.
Example 4.2.
. with , and , and and . For this example, , and . It should be emphasized that a same coverset belonging to different ’s will be viewed as different coversets. For example, belongs to both and . To distinguish them, we shall use to denote coversets in . For example, contains three coversets , and , contains one coveset , contains two coversets and .
The following is an integer program for constrained MDSC:
(7)  
In fact, indicates whether a coverset is selected and indicates whether set is selected. The first constraint says that if then at least one coverset is selected and thus is fully covered. The family of selected sets is the union of all those selected coversets. So, if belongs to some selected coverset, then should be , namely,
(8) 
Notice that to fully cover element , it is sufficient to select exactly one coverset from . So, we may replace (8) by the second constraint of (7) for the purpose of linearization. The object function is exactly the density of selected sets.
Consider Example 4.2 again. Setting and all other values to be 0 implies that the selected subcollection and are fully covered. By the second constraint, and we may take (to minimize the objective function, it is better to take to be 0 if the right hand side of the second constraint is 0). By the first constraint, and we may take (to minimize the objective function, it is better to take to be 1 for all those elements which are fully covered). Notice that serves as both and , the value for the former is 0 and the value for the latter is , they are set independently.
The above integer program (7) can be relaxed to the following linear program LP:
(9)  
It should be noticed that although there is exponential number of variables, the linear program can be solved in polynomial time, the detail of which is presented as follows. Consider the dual program of (9):
(10)  
By LP primaldual theory [14], one may solve (9) through solving (4.1), and to solve (4.1) in polynomial time, it suffices to construct a separation oracle for the third constraint. For any and , define . For any element , a coverset minimizing can be found by choosing the cheapest (measured by cost ) sets containing . By checking whether holds for every , we can either claim the validity of the constraints or find out a violated constraint. Using ellipsoid method, linear program (4.1) is polynomialtime solvable.
Lemma 4.3.
4.2 The Algorithm
Inspired by the method of paper [6] for network design problems, we design an approximation algorithm for MDSC which makes use of an approximation algorithm for the minimum node weighted Steiner network problem.
Definition 4.4 (Node Weighted Steiner Network Problem (NWSN) [21]).
Given a graph with a weight function on and a connectivity requirement for each pair of nodes , the minimum node weighted Steiner network problem asks for a subgraph such that every pair of nodes are connected by at least edgedisjoint paths in and the node weight of is as small as possible.
Notice that must include all those nodes with for at least one node . Such a node can be viewed as a terminal node. On the other hand, those nodes with for any need not be included in . Such nodes are Steiner nodes. The NWSN problem is to select a set of Steiner nodes with the minimum weight to satisfy those connectivity requirements between terminal nodes.
The algorithm is presented in Algorithm 2. It partitions elements into disjoint union of sets ’s, according to an optimal fractional solution to linear program (9). Let be a set satisfying the condition specified by line 3 of the algorithm, whose existence will be shown later. The NWSN instance used in line 4 of the algorithm is constructed in the following way. Let be the graph on node set and edge set . Set the weight on every to be and the weight on all other nodes to be zero. Set the connectivity requirement for every and the connectivity requirement on all other node pairs to be zero. Denote the constructed instance as . The output of the algorithm is the subcollection of sets corresponding to those Steiner nodes in the calculated Steiner network on .
The rationale behind the algorithm will be manifested through the analysis in the following subsection.
4.3 Theoretical Analysis
Notice that any feasible solution to the NWSN problem on instance induces a feasible solution to the multicover problem on instance . In fact, suppose element is connected to node by edgedisjoint paths which has the form of , then fully covers element . Taking the union of such sets will fully cover all elements in .
To analyze the correctness and the performance ratio, we first give an LPrelaxation for the set multicover problem and an LPrelaxation for the NWSN problem.
LPrelaxation for set multicover. Similar to the construction of integer program (7), the multicover problem on instance can be formulated as an integer linear program whose relaxation is as follows:
(11)  
LPrelaxation for NWSN. Next, consider the nodeweighted Steiner network problem. For each pair of nodes and , an pathset is a set of edgedisjoint paths in . Denote by the family of all pathsets and let be the union of all these families. The following linear program LP is a relaxation for the NWSN problem which was presented in [5]:
(12)  
In fact, for the corresponding integral formulation in which and can only take values from , indicates whether pathset is chosen and indicates whether node is chosen. The model in [5] uses equality instead of inequality in the first constraint, whose meaning is that for each pair of nodes and , exactly one pathset is chosen, and thus the connectivity requirement between and is satisfied. The second constraint says that if node belongs to some chosen pathset, then must be chosen. Hence the chosen nodes are those nodes on the union of chosen pathsets, and the objective is to minimize the weight of those chosen nodes. When relaxing variables by allowing fractional values, any optimal solution automatically has , , and . Hence it does not matter if we relax the first constraint to be inequality and do not explicitly require and to be no greater than 1.
Now, we are ready to analyze the performance ratio of Algorithm 2.
Theorem 4.5.
For , Algorithm 2 has performance ratio at most for constrained MDSC.
Proof.
We prove the theorem step by step by first establishing the following three claims.
Claim 1. An index as in Line 3 of Algorithm 2 exists and .
In fact, since is decomposed into parts , by the constraint , there exists an index such that . Since for every , we have .
Since and for each , it can be calculated that for . Hence the above .
Claim 2. , where is the optimal value of linear program (11).
Let for each set and let for each coverset . For any element and any set , we have
Since , we have for every . Hence
This implies that is a feasible solution to (11). Hence
where the last inequality comes from Lemma 4.3.
Claim 3. , where is the optimal value of linear program (12).
Corollary 4.6.
PSMC has an bicriteria algorithm.
5 Conclusion and Discussion
In this paper, we studied the partial set multicover problem (PSMC). By proposing a new NPhard problem called minimum density subcollection (MDSC) and designing an approximation for MDSC, we obtained an bicritera algorithm for PSMC.
Our studies show that PSMC is a very challenging problem. One reason is that it possesses an “allornothing” property. As an illustration, suppose the covering requirement for each element is 10. Covering an element 9 times has the same effect as not covering it at all. So, although the algorithm has strived to pick a large amount of sets, it is still possible that only very few elements have their covering requirements satisfied. Since what matters is only the number of fully covered elements, a lot of efforts might have been wasted on fruitless covering on those not fully covered elements. Thus, in order to obtain a good approximation, one has to control the wasted. Such an allornothing phenomenon is interesting and appear frequently in the real world. New ideas are needed and conquering such a problem will have a great theoretical value.
Our performance ratio depends on the maximum covering requirement . New ideas have to be further explored to design algorithms without dependence on . Designing approximation algorithms without violation is another challenging problem.
Acknowledgment
This research is supported by NSFC (11771013, 11531011, 61751303).
References
 [1] G. Ausiello, P. Crescenzi. G. Gambosi, V. Kann, A. MarchettiSpaccamela and M. Protasi, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (2003), Springer.
 [2] R. BarYuhuda, Using homogeneous weights for approximating the partial cover problem. Journal of Algorithms, 39 (2001) 137–144.
 [3] R. BarYuhuda, S. Even, A LocalRatio Theorem for Approximating the Weighted Vertex Cover Problem. NorthHolland Mathematics Studies, 109 (1985) 2745.

[4]
M. Charikar, S. Khuller, D.M. Mount and G. Narasimhan, Algorithms for facility location problems with outliers. Proc. 12th ACMSIAM Sympos. Discrete Algorithms (2001) 642–651.
 [5] C. Chekuri, A. Ene, A. Vakilian, Prizecollecting survivable network design in nodeweighted graphs. APPROX/RANDOM LNCS 7408 (2012) 98–109.
 [6] C. Chekuri, G. Even, A. Gupta and D. Segev, Set connectivity problems in undirected graphs and the directed steiner network problem. ACM Trans. Algorithms 7(2) (2011) 18:118:17.

[7]
V. Chvatal, A greedy heuristic for the setcovering problem, Math. Oper. Res. 4 (1979) 233–235.
 [8] P. David, B. David, The Design of Approximation Algorithms. Cambridge University Press, 2010.
 [9] I. Dinur, D. Steurer, Analytical approach to parallel repetition. STOC2014 (2014) 624–633.
 [10] G. Dobson, Worstcase analysis of greedy heuristics for integer programming with nonnegatice data. Math. Oper. Res. 7 (1982) 515–531.

[11]
U. Feige, A threshold of ln n for approximating set cover, in Proc. 28th ACM Symposium on the Theory of Computing, pp. 312–318, 1996.
 [12] R. Gandhi, S. Khuller, A. Srinivasan, Approximation algorithms for partial covering problems. Journal of Algorithms, 53(1) (2004) 55–84.
 [13] D.S. Hochbaum, Approximation algorithms for the set covering and vertex cover problems. SIAM Journal on Computing, 11(1982) 555–556.
 [14] J.P. Ignizio, T.M. Cavalier, Linear programming, PrenticeHall, Inc. Upper Saddle River, NJ, USA (1994).
 [15] D.S. Johnson, Approximation algorithms for combinatorial problems, J. Comput. System Sci., 9 (1974) 256–278.
 [16] R.M. Karp, Reducibility among combinatorial problems , in Complexity of Computer Computations, R. E. Miller and J. W. Thatcher, eds., Plenum Press, New York, pp. 85–103, 1972.

[17]
M. Kearns, The Computational Complexity of Machine Learning. MIT Press, Cambridge, MA, 1990.
 [18] Khot S, Regev O (2008) Vertex cover might be hard to approximate to within . Journal of Computer and System Sciences, 74(3) 335–349.
 [19] J. Konemann, O. Parekh, D. Segev, A uinifed approach to approximating partial covering problems. Algorithmica, 59 (2011) 489–509.
 [20] L. Lovász, On the ratio of optimal integral and fractional covers. Discrete Math., 13 (1975) 383–390.
 [21] Z. Nutov, Approximating Steiner networks with node weights, SIAM J. Computing, 39(7) (2010) 3001–3022.
 [22] S. Rajagopalan , V. Vazirani, Primaldual RNC approximation algorithms for set cover and covering integer programs. SIAM J. COMPUT, 28 (1998) 525–540.
 [23] Y. Ran, Z. Zhang, H. Du, Y. Zhu, Approximation algorithm for partial positive influence problem in social network. Journal of Combinatorial Optimization, 33(2) (2017) 791–802.
 [24] Y. Ran, Y. Shi, Z. Zhang, Local ratio method on partial set multicover, Journal of Combinatorial Optimization, 34(1) (2017) 302–313.
 [25] Y. Ran, Y. Shi, Z. Zhang, Primal dual algorithm for partial set multicover, submitted to COCOA2018.
 [26] V. Setty, G. Kreitz, G. Urdaneta, R. Vitenberg, M. van Steen, Maximizing the number of satisfied subscribers in pub/sub systems under capacity constraints. INFOCOM 2014, 2580–2588.
 [27] P. Slavík, Improved performance of the greedy algorithm for partial cover. Information Processing Letters, 64(5): 251–254.
 [28] Z. Zhang, J. Willson, Z. Lu, W. Wu, X. Zhu and DZ. Du, Approximating maximum lifetime coverage through minimizing weighted cover in homogeneous wireless sensor networks. IEEE/ACM Transactions on Networking, 24(6) (2016) 3620–3633.
Appendix: A Flaw in the Conference Version.
A preliminary version of this paper was presented in INFOCOM2017. Making using of an approximation algorithm for MDSC, it was claimed that one can obtain an approximation algorithm for PSMC. However, there is a flaw. The algorithm in that paper greedily selects densest subcollections until at least elements are fully covered. Then it prunes the last subcollection by greedily selecting subcollections of consisting of at most sets until the covering requirement is satisfied. Suppose the subcollections obtained in the pruning step are . The approximation analysis relies on the following inequality:
(13) 
However, this is not true. Consider the following example
Example 5.1.
with , , , , , and .
For this example, the densest subcollection of is . Then the pruning step selects and sequentially. Notice that
The reason why inequality (13) does not hold is because