Set cover is an extensively studied problem in combinatorial optimization. In this paper, we study a variant of the set cover problem, namely thepartial set multi-cover problem, which is defined as follows.
Definition 1.1 (Partial Set Multi-Cover (PSMC)).
Suppose is an element set, is a collection of subsets of , each set has a cost , each element has a positive covering requirement . For a sub-collection , denote by those sets of containing element . If , we say that is fully covered by , denoted as . The cost of sub-collection is . Given with and a real number which is a constant between 0 and 1, the PSMC problem is to find a minimum cost sub-collection such that . An instance of PSMC is denoted as .
The PSMC problem includes two important variants of the set cover problem. When , it is the partial set cover problem (PSC). When , it is the set multi-cover problem (SMC). One motivation of PSC comes from the phenomenon that in a real world, “satisfying all requirements” will be too costly or even impossible, due to resource limitation or political policy . And SMC comes from the requirement of fault tolerance in practice . There are a lot of researches on PSC and SMC, achieving performance ratios matching the lower bounds for the classic set cover problem, namely and , where is the number of elements and is the maximum number of sets containing a common element. However, study on the combination of these two problems is very rare. According to our recent paper , under the ETH assumption, the PSMC problem cannot be approximated within factor for some constant .
The aim of this paper is to explore a greedy strategy on PSMC.
1.1 Related Work
The set cover problem (SC) was one of the first 21 problems proved to be NP-hard in Karp’s seminal paper . In fact, Feige  proved that it cannot be approximated within factor unless , where is the number of elements. Dinur and Steurer  proved the same lower bound under the assumption that . Khot and Regev  showed that it cannot be approximated within factor for any constant assuming that unique games conjecture is true, where is the maximum number of sets containing a common element. On the other hand, greedy strategy achieves performance ratio [7, 15, 20], where is the maximum cardinality of a set and is the Harmonic number. And -approximation exists by either LP rounding method  or local ratio method .
gave a greedy algorithm achieving the same performance ratio, using dual fitting analysis, which implies that the integrality gap of the classic linear program of MSMC is at most.
For the partial set cover problem, Kearns  gave a greedy algorithm achieving performance ratio . By modifying the greedy algorithm a little, Slavik  improved the performance ratio to , where is the percentage that elements are required to be covered. Gandhi et al.  proposed a primal-dual algorithm achieving performance ratio . Bar-Yuhuda  studied a generalized version in which each element has a profit and the total profit of covered elements should exceed a threshold. Using local ratio method, he also obtained performance ratio . Konemann et al.  presented a Lagrangian relaxation framework and obtained performance ratio for the generalized partial set cover problem.
From the above related work, it can be seen that both PSC and SMC admit performance ratios matching those best ratios for the classic set cover problem. However, combining partial set cover with set multi-cover has enormously increased the difficulty of studies. Ran et al.  were the first to study approximation algorithms for PSMC, using greedy strategy and dual-fitting analysis. However, their ratio is meaningful only when the covering percentage is very close to . In paper , the authors presented a simple greedy algorithm achieving performance ratio . They also presented a local ratio algorithm, which reveals a what they called “shock wave” phenomenon: their performance ratio is for both PSC and SMC , however, when is smaller than 1 by a very small constant, the ratio jumps abruptly to . In our recent paper , we proved that PSMC cannot have a better than polynomial performance ratio by a reduction from the well-known densest -subgraph problem.
1.2 Our Contribution and Techniques
The contributions of this paper is summarized as follows.
A new problem called minimum density sub-collection (MDSC) is defined, which is to find a sub-collection to minimize the ratio , where is the set of elements fully covered by . We prove that MDSC is also NP-hard.
We show that if MDSC has an -approximation algorithm, then PSMC has an -bicriteria approximation algorithm, that is, the output of our algorithm has cost at most times that of an optimal solution, while the total number of fully covered elements is at least , where is the covering ratio required by the problem and is the total number of elements.
We design an -approximation algorithm for MDSC, where is the maximum covering requirement of elements. Combining this result with the above, PSMC has an -bicriteria algorithm.
Our algorithm uses a greedy strategy. However, there is a problem of which sets should be chosen in each iteration. As indicated by previous studies in paper , a natural generalization of the classic greedy algorithm cannot yield good results. One reason might be that the number of elements fully covered by a sub-collection of sets is not submodular. In this paper, our greedy algorithm iteratively picks an approximate solution to the MDSC problem until the number of elements which are fully covered reaches a certain degree. An obstacle to obtaining a good approximation factor lies in the last iteration: the sub-collection chosen in the last iteration might cover much more elements than required. Although its density is low, its cost might be too large to be bounded by the optimal value. So, we stop the algorithm when at least elements are fully covered, and thus leading to a bicriteria approximation algorithm.
A crucial stepping-stone to the above algorithm is the MDSC problem, which is also NP-hard. An example can be constructed showing that a natural LP formulation has integrality gap arbitrarily large. To overcome such a difficulty, we formulate the problem as a linear program using a language having a taste of “flow” and made use of an approximation algorithm for the minimum node-weighted Steiner network problem as a subroutine to yield a performance guaranteed approximation algorithm for MDSC.
The paper is organized as follows. In Section 2, we give the definition of MDSC and prove its NP-hardness. In Section 3, we show how an -approximation algorithm for MDSC leads to an -bicriteria algorithm for PSMC. In Section 4, we propose an -approximation algorithm for MDSC. In Section 5, the paper is concluded with some discussions on future work.
A preliminary version of this paper was presented in INFOCOM2017. There is a flaw in that version. We explain in the appendix where is the flaw.
For simplicity of statement, we shall use to denote the set of elements fully covered by sub-collection . Define the density of a sub-collection as
Definition 2.1 (Minimum Density Sub-Collection (MDSC)).
Given , the MDSC problem is to find a sub-collection with the minimum density.
Unfortunately, MDSC is also NP-hard.
The MDSC problem is NP-hard.
We reduce the perfect 3-dimensional matching problem (which is APX-hard ) to MDSC. Given an integer , three sets each having cardinality , and a set , the perfect 3-dimensional matching problem asks whether there is a subset with such that for any elements , , , and . Construct an instance of MDSC as follows. Let and . The covering requirement and for . The cost for all .
Next, we show that there is a perfect 3-dimensional matching if and only if the optimal value for the MDSC problem is . In fact, if is a perfect 3-dimensional matching, then has and . Suppose the instance does not have a perfect 3-dimensional matching. Consider an arbitrary sub-collection and its corresponding subset . Then . If , then . If , then and thus . In this case, . The claimed result is proved. ∎
3 Bicriteria Algorithm for PSMC
In this section, we make use of an -approximation algorithm for MDSC to design a bicriteria algorithm for PSMC.
3.1 The Algorithm
The algorithm is presented in Algorithm 1. It follows the classic greedy strategy. A main difference is that instead of choosing sets one by one, in each iteration, it implements an -approximation algorithm for MDSC to greedily choose sub-collections. After each iteration, the instance is updated with respect to the current sub-collection to form a reduced instance , where is the set of elements not having been fully covered, the total remaining covering ratio
the remaining covering requirement for element is , and those elements which have been fully covered by have to be removed from each set. In the following, when we mention a reduced instance or when we say that the instance is updated, it is always understood that the above operations are executed. When the algorithm terminates, we have , and thus the number of fully covered elements is at least by the expression of reduced covering ratio defined in (1).
3.2 Performance Ratio Analysis
Suppose Algorithm 1 is executed times, selecting sub-collections
. We estimate costsand separately. In the following, denotes an optimal solution to PSMC, and is the optimal cost.
For , denote by the number of elements remaining to be fully covered after is selected. Then for where
After the -th iteration, is a sub-collection fulfilling the remaining covering requirement . So the density of an optimal solution to the MDSC problem in the -th iteration is upper bounded by . Since approximates the density of within a factor of , we have
The lemma is proved. ∎
Implementing an -approximation algorithm for MDSC, the PSMC problem admits an -bicriteria approximation.
For small , the performance ratio in the above theorem can be viewed as since is a constant.
4 Approximation Algorithm for MDSC
In this section, we present an approximation algorithm for MDSC. The algorithm is based on an LP formulation and makes use of a node weighted Steiner network algorithm.
The following is a natural formulation of integer program for MDSC.
Here indicates whether set is selected and indicates whether element is fully covered. The first constrained says that if then at least sets containing must be selected and thus is fully covered. Relaxing (4.1) and by a scaling, we have the following linear program:
Hence, to obtain a good approximation, we need to find another program. In the following, we formulate the problem in an more involved flow-like language. For an element , an -cover-set is a sub-collection with which fully covers . Denote by the family of all -cover-sets, and . Consider the following example.
. with , and , and and . For this example, , and . It should be emphasized that a same cover-set belonging to different ’s will be viewed as different cover-sets. For example, belongs to both and . To distinguish them, we shall use to denote cover-sets in . For example, contains three -cover-sets , and , contains one -cove-set , contains two -cover-sets and .
The following is an integer program for constrained MDSC:
In fact, indicates whether a cover-set is selected and indicates whether set is selected. The first constraint says that if then at least one -cover-set is selected and thus is fully covered. The family of selected sets is the union of all those selected cover-sets. So, if belongs to some selected cover-set, then should be , namely,
Notice that to fully cover element , it is sufficient to select exactly one -cover-set from . So, we may replace (8) by the second constraint of (7) for the purpose of linearization. The object function is exactly the density of selected sets.
Consider Example 4.2 again. Setting and all other -values to be 0 implies that the selected sub-collection and are fully covered. By the second constraint, and we may take (to minimize the objective function, it is better to take to be 0 if the right hand side of the second constraint is 0). By the first constraint, and we may take (to minimize the objective function, it is better to take to be 1 for all those elements which are fully covered). Notice that serves as both and , the -value for the former is 0 and the -value for the latter is , they are set independently.
The above integer program (7) can be relaxed to the following linear program LP:
It should be noticed that although there is exponential number of variables, the linear program can be solved in polynomial time, the detail of which is presented as follows. Consider the dual program of (9):
By LP primal-dual theory , one may solve (9) through solving (4.1), and to solve (4.1) in polynomial time, it suffices to construct a separation oracle for the third constraint. For any and , define . For any element , a cover-set minimizing can be found by choosing the cheapest (measured by cost ) sets containing . By checking whether holds for every , we can either claim the validity of the constraints or find out a violated constraint. Using ellipsoid method, linear program (4.1) is polynomial-time solvable.
4.2 The Algorithm
Inspired by the method of paper  for network design problems, we design an approximation algorithm for MDSC which makes use of an approximation algorithm for the minimum node weighted Steiner network problem.
Definition 4.4 (Node Weighted Steiner Network Problem (NWSN) ).
Given a graph with a weight function on and a connectivity requirement for each pair of nodes , the minimum node weighted Steiner network problem asks for a subgraph such that every pair of nodes are connected by at least edge-disjoint paths in and the node weight of is as small as possible.
Notice that must include all those nodes with for at least one node . Such a node can be viewed as a terminal node. On the other hand, those nodes with for any need not be included in . Such nodes are Steiner nodes. The NWSN problem is to select a set of Steiner nodes with the minimum weight to satisfy those connectivity requirements between terminal nodes.
The algorithm is presented in Algorithm 2. It partitions elements into disjoint union of sets ’s, according to an optimal fractional solution to linear program (9). Let be a set satisfying the condition specified by line 3 of the algorithm, whose existence will be shown later. The NWSN instance used in line 4 of the algorithm is constructed in the following way. Let be the graph on node set and edge set . Set the weight on every to be and the weight on all other nodes to be zero. Set the connectivity requirement for every and the connectivity requirement on all other node pairs to be zero. Denote the constructed instance as . The output of the algorithm is the sub-collection of sets corresponding to those Steiner nodes in the calculated Steiner network on .
The rationale behind the algorithm will be manifested through the analysis in the following subsection.
4.3 Theoretical Analysis
Notice that any feasible solution to the NWSN problem on instance induces a feasible solution to the multi-cover problem on instance . In fact, suppose element is connected to node by edge-disjoint paths which has the form of , then fully covers element . Taking the union of such sets will fully cover all elements in .
To analyze the correctness and the performance ratio, we first give an LP-relaxation for the set multi-cover problem and an LP-relaxation for the NWSN problem.
LP-relaxation for set multi-cover. Similar to the construction of integer program (7), the multi-cover problem on instance can be formulated as an integer linear program whose relaxation is as follows:
LP-relaxation for NWSN. Next, consider the node-weighted Steiner network problem. For each pair of nodes and , an -path-set is a set of edge-disjoint -paths in . Denote by the family of all -path-sets and let be the union of all these families. The following linear program LP is a relaxation for the NWSN problem which was presented in :
In fact, for the corresponding integral formulation in which and can only take values from , indicates whether path-set is chosen and indicates whether node is chosen. The model in  uses equality instead of inequality in the first constraint, whose meaning is that for each pair of nodes and , exactly one -path-set is chosen, and thus the connectivity requirement between and is satisfied. The second constraint says that if node belongs to some chosen path-set, then must be chosen. Hence the chosen nodes are those nodes on the union of chosen path-sets, and the objective is to minimize the weight of those chosen nodes. When relaxing variables by allowing fractional values, any optimal solution automatically has , , and . Hence it does not matter if we relax the first constraint to be inequality and do not explicitly require and to be no greater than 1.
Now, we are ready to analyze the performance ratio of Algorithm 2.
For , Algorithm 2 has performance ratio at most for constrained MDSC.
We prove the theorem step by step by first establishing the following three claims.
Claim 1. An index as in Line 3 of Algorithm 2 exists and .
In fact, since is decomposed into parts , by the constraint , there exists an index such that . Since for every , we have .
Since and for each , it can be calculated that for . Hence the above .
Claim 2. , where is the optimal value of linear program (11).
Let for each set and let for each cover-set . For any element and any set , we have
Since , we have for every . Hence
This implies that is a feasible solution to (11). Hence
where the last inequality comes from Lemma 4.3.
Claim 3. , where is the optimal value of linear program (12).
PSMC has an -bicriteria algorithm.
5 Conclusion and Discussion
In this paper, we studied the partial set multi-cover problem (PSMC). By proposing a new NP-hard problem called minimum density sub-collection (MDSC) and designing an -approximation for MDSC, we obtained an -bicritera algorithm for PSMC.
Our studies show that PSMC is a very challenging problem. One reason is that it possesses an “all-or-nothing” property. As an illustration, suppose the covering requirement for each element is 10. Covering an element 9 times has the same effect as not covering it at all. So, although the algorithm has strived to pick a large amount of sets, it is still possible that only very few elements have their covering requirements satisfied. Since what matters is only the number of fully covered elements, a lot of efforts might have been wasted on fruitless covering on those not fully covered elements. Thus, in order to obtain a good approximation, one has to control the wasted. Such an all-or-nothing phenomenon is interesting and appear frequently in the real world. New ideas are needed and conquering such a problem will have a great theoretical value.
Our performance ratio depends on the maximum covering requirement . New ideas have to be further explored to design algorithms without dependence on . Designing approximation algorithms without violation is another challenging problem.
This research is supported by NSFC (11771013, 11531011, 61751303).
-  G. Ausiello, P. Crescenzi. G. Gambosi, V. Kann, A. Marchetti-Spaccamela and M. Protasi, Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties (2003), Springer.
-  R. Bar-Yuhuda, Using homogeneous weights for approximating the partial cover problem. Journal of Algorithms, 39 (2001) 137–144.
-  R. Bar-Yuhuda, S. Even, A Local-Ratio Theorem for Approximating the Weighted Vertex Cover Problem. North-Holland Mathematics Studies, 109 (1985) 27-45.
M. Charikar, S. Khuller, D.M. Mount and G. Narasimhan, Algorithms for facility location problems with outliers. Proc. 12th ACM-SIAM Sympos. Discrete Algorithms (2001) 642–651.
-  C. Chekuri, A. Ene, A. Vakilian, Prize-collecting survivable network design in node-weighted graphs. APPROX/RANDOM LNCS 7408 (2012) 98–109.
-  C. Chekuri, G. Even, A. Gupta and D. Segev, Set connectivity problems in undirected graphs and the directed steiner network problem. ACM Trans. Algorithms 7(2) (2011) 18:1-18:17.
V. Chvatal, A greedy heuristic for the set-covering problem, Math. Oper. Res. 4 (1979) 233–235.
-  P. David, B. David, The Design of Approximation Algorithms. Cambridge University Press, 2010.
-  I. Dinur, D. Steurer, Analytical approach to parallel repetition. STOC2014 (2014) 624–633.
-  G. Dobson, Worst-case analysis of greedy heuristics for integer programming with nonnegatice data. Math. Oper. Res. 7 (1982) 515–531.
U. Feige, A threshold of ln n for approximating set cover, in Proc. 28th ACM Symposium on the Theory of Computing, pp. 312–318, 1996.
-  R. Gandhi, S. Khuller, A. Srinivasan, Approximation algorithms for partial covering problems. Journal of Algorithms, 53(1) (2004) 55–84.
-  D.S. Hochbaum, Approximation algorithms for the set covering and vertex cover problems. SIAM Journal on Computing, 11(1982) 555–556.
-  J.P. Ignizio, T.M. Cavalier, Linear programming, Prentice-Hall, Inc. Upper Saddle River, NJ, USA (1994).
-  D.S. Johnson, Approximation algorithms for combinatorial problems, J. Comput. System Sci., 9 (1974) 256–278.
-  R.M. Karp, Reducibility among combinatorial problems , in Complexity of Computer Computations, R. E. Miller and J. W. Thatcher, eds., Plenum Press, New York, pp. 85–103, 1972.
M. Kearns, The Computational Complexity of Machine Learning. MIT Press, Cambridge, MA, 1990.
-  Khot S, Regev O (2008) Vertex cover might be hard to approximate to within . Journal of Computer and System Sciences, 74(3) 335–349.
-  J. Konemann, O. Parekh, D. Segev, A uinifed approach to approximating partial covering problems. Algorithmica, 59 (2011) 489–509.
-  L. Lovász, On the ratio of optimal integral and fractional covers. Discrete Math., 13 (1975) 383–390.
-  Z. Nutov, Approximating Steiner networks with node weights, SIAM J. Computing, 39(7) (2010) 3001–3022.
-  S. Rajagopalan , V. Vazirani, Primal-dual RNC approximation algorithms for set cover and covering integer programs. SIAM J. COMPUT, 28 (1998) 525–540.
-  Y. Ran, Z. Zhang, H. Du, Y. Zhu, Approximation algorithm for partial positive influence problem in social network. Journal of Combinatorial Optimization, 33(2) (2017) 791–802.
-  Y. Ran, Y. Shi, Z. Zhang, Local ratio method on partial set multi-cover, Journal of Combinatorial Optimization, 34(1) (2017) 302–313.
-  Y. Ran, Y. Shi, Z. Zhang, Primal dual algorithm for partial set multi-cover, submitted to COCOA2018.
-  V. Setty, G. Kreitz, G. Urdaneta, R. Vitenberg, M. van Steen, Maximizing the number of satisfied subscribers in pub/sub systems under capacity constraints. INFOCOM 2014, 2580–2588.
-  P. Slavík, Improved performance of the greedy algorithm for partial cover. Information Processing Letters, 64(5): 251–254.
-  Z. Zhang, J. Willson, Z. Lu, W. Wu, X. Zhu and D-Z. Du, Approximating maximum lifetime -coverage through minimizing weighted -cover in homogeneous wireless sensor networks. IEEE/ACM Transactions on Networking, 24(6) (2016) 3620–3633.
Appendix: A Flaw in the Conference Version.
A preliminary version of this paper was presented in INFOCOM2017. Making using of an approximation algorithm for MDSC, it was claimed that one can obtain an -approximation algorithm for PSMC. However, there is a flaw. The algorithm in that paper greedily selects densest sub-collections until at least elements are fully covered. Then it prunes the last sub-collection by greedily selecting sub-collections of consisting of at most sets until the covering requirement is satisfied. Suppose the sub-collections obtained in the pruning step are . The approximation analysis relies on the following inequality:
However, this is not true. Consider the following example
with , , , , , and .
For this example, the densest sub-collection of is . Then the pruning step selects and sequentially. Notice that
The reason why inequality (13) does not hold is because