## 1 Introduction

In this paper, we study parallel approximation algorithm for the minimum weight set cover (MinWSC) problem with a small neighborhood cover (SNC) property.

MinWSC is a classic combinatorial optimization problem. Tight approximation ratios have long been known, including

-approximation [8] and -approximation [3], where is the size of a maximum set, is the maximum frequency of elements (that is, the number of sets containing a common element).With the fast development in computer architecture and the increasing number of CPU cores, designing efficient parallel algorithms has emerged as an active research area in recent years. There are many parallel algorithms for MinWSC [16, 13, 6, 4]. In particular, Khuller et al. [13] gave a parallel algorithm with approximation ratio where is a constant. Note that might be in a worst general case. For the problem with special structural properties, it might be possible to get a better approximation ratio. For example, in [2], Agarwal et al. proposed a structural property called small neighborhood cover (SNC). A lot of important MinWSC problems possess SNC property, such as the vertex cover problem (VC), interval cover problem (IC), tree cover problem (TC), interval hitting problem, priority interval cover problem, bag interval cover problem, etc. A parallel algorithm for MinWSC with -SNC property was first studied in [2, 1], where [1] is the preliminary version of [2]. [2] obtained approximation ratio at most in rounds, where is the number of sets, and is the depth of the -SNC decomposition. Note that in the above mentioned problems, is much smaller than .

In the conclusion part of [2], three questions were proposed. In this paper, we give positive answers to two of them, and present an improved parallel algorithm.

### 1.1 Related work

For MinWSC, Chvtal [8] gave a greedy algorithm achieving approximation ratio , where is the size of a maximum set and is the -th Harmonic number (note that ). This ratio is tight under the assumption [9, 10]. In [3], Bar-Yehuda and Even used prime-dual schema to obtain an approximation ratio , where is the maximum number of sets containing a common element. This ratio is tight under the Unique Games Conjecture [12]. Note that these are all sequential algorithms.

Considering parallel algorithms for MinWSC, Berger et al. [4] gave a parallel algorithm with approximation ratio in rounds, where is the number of elements and is the number of sets. In [16], using a primal-dual schema, Rajagopalan and Vazirani gave a parallel algorithm with improved number of rounds and a weaker approximation ratio . In [6], by proposing a concept called nearly independent set, Blelloch et al. were able to further improve the approximation ratio to in rounds, where is the sum of sizes of all sets in . These are parallel algorithms for MinWSC achieving approximation ratio measured in terms of . In [13], by a primal-dual method, Khuller et al. presented a parallel algorithm for MinWSC with approximation ratio in rounds. For the partial version of MinWSC, the goal of which is to cover not all elements, but at least some percentage of the elements, Ran et al. [15] presented a parallel algorithm with approximation ratio at most in rounds, where is a constant.

In [2], Agarwal et al. proposed a property called -SNC, which applies to many problems including VC, TC, IC and some other graph structural covering problems. Using primal-dual schema, they presented a parallel algorithm with approximation ratio in rounds, where is the depth of -SNC decomposition. They also gave a distributed algorithm for MinWSC with -SNC property and obtained approximation ratio in communication rounds.

The parallel algorithm in [2] consists of a forward phase and a deletion phase. In the conclusion part, three questions were proposed:

In the forward phase, can one construct ()-maximal solutions via a procedure having running time independent of ?

In the deletion phase, the algorithm produces a solution satisfying the primal slackness property with parameter . Can this be improved to ?

The deletion phase leads to iterations. Can this be improved to ?

### 1.2 Our Contributions

In this paper, using an improved primal-dual method, we give a parallel algorithm for MinWSC with -SNC property, achieving approximation ratio in rounds where is a constant. This work not only improves the approximation ratio in [2], but also answers two of the three questions in [2] positively.

In the forward phase of [2], the authors used a parallel primal-dual algorithm to obtain a -maximal solution with . In our algorithm, we use a different idea to increase dual variables in a geometric series and obtain a -maximal solution with the same number of rounds as that of [2]. This gives a positive answer to .

In the deletion phase of [2], the authors dealt the elements from layer down to layer 1. In layer , they tried to find maximal independent sets to obtain a solution satisfying the primal slackness property with parameter in rounds. In this paper, we managed to decrease the primal slackness parameter down to . To realize such a goal, we proposed a random selection method such that the number of rounds to reduce the parameter from down to is bounded by

. A crucial trick is: how to guarantee that with a constant probability, the number of sets covering a bad element (that is, an element which is covered more than

times by the current collection of sets) can be strictly reduced, while the feasibility (that is, all elements are still covered) is maintained. This leads to a positive answer to .The remaining part of this paper is organized as follows. In Section 2, we introduce some terminologies and definitions used in this paper. In Section 3, we present our parallel algorithm for MinWSC with SNC property and give strict analysis. Section 4 concludes the paper and gives further discussions.

## 2 Preliminaries

In this section, we give some terminologies and definitions used in this paper.

###### Definition 2.1 (MinWSC).

Given a weighted set system , where is a ground set, is a subcollection of subsets of , and is a weight function on , the goal of MinWSC is to find a minimum weight subcollection of to cover all elements, where the set of elements covered by subcollection , denoted as , is , and the weight .

For any , let be the subcollection of consisting of sets containing . Denote the collection of sets containing both and . We say that

and are neighbors in if and are both in some . | (1) |

In other words, neighbors of in constitute the set . Note that is a neighbor of itself by this definition.

For an easier understanding of SNC property, let us first consider the interval cover problem, in which a set of points on a line is to be covered by the minimum number of intervals chosen from a given collection of intervals. Note that an interval cover instance can be viewed as a set cover instance by viewing each interval as an element set containing those points contained in this interval (see Fig. 1 for an illustration). Observe that in any minimal interval cover , any point belongs to at most two intervals of . In fact, for any point , let and be the intervals in with the leftmost and the rightmost endpoints, respectively, then can cover all points in . For example, in Fig. 1, point belongs to four intervals . If and are taken, then all neighbors of (namely ) are covered by , and thus are not needed. Such a property is preferred because reducing the frequency of points in the selected subcollection of sets will lead to better approximation ratio.

Even better in the above example, the leftmost point belongs to only one interval in any minimal solution: among all selected intervals covering this point, the one with the rightmost endpoint covers all its neighbors. Call those points satisfying such a better property as good points. Note that not all points are good, but such a better property is hereditary, in the sense that any sub-instance has good points. So, we can decompose all points into layers, by first finding out all good points in the original instance, removing them, and then iteratively finding good points in the residual instances.

These observations motivate the definition of -SNC property proposed in [2]. Because of the above consideration of decomposition in the remaining instance, it is defined in a more general setting: restricted to any element set and any subcollection of sets.

###### Definition 2.2 (-collapsible and base group set).

For any subset containing and , the neighborhood of restricted to is . We say that is -collapsible if there exists a collection consisting of at most sets from covering . Call as a base group set of restricted to .

###### Example 2.3.

For the example in Fig. 1, , , where is the set of points covered by interval . Suppose and . Then, , . Note that is 2-collapsible since is a base group set of restricted to .

###### Definition 2.4 (-Snc).

Given a set system , for a subset and an element , call a -SNC element in if for any , is -collapsible. is said to have -SNC property if for any , there exists an element which is a -SNC element in .

The property of -SNC is hereditary in the following sense, if is a -SNC element in , then for any with , is also a -SNC element in .

###### Remark 2.5.

Interval cover problem has 1-SNC property. In fact, for any , the leftmost point of is a 1-SNC element: for any , the interval of with the rightmost endpoint covers .

For the instance in Fig. 1, if we consider element set , then is a 1-SNC element. For example, for subcollection , interval covers all points of , and thus a base group set of consists of only one set. As we have shown in Example 2.3 that is only 2-collapsible. However, does have 1-SNC elements, namely . It should be noticed that empty set is a base group set of restricted to . The size of a base group set is only required to be at most , not exactly . Also notice that removing 1-SNC elements of , then become 1-SNC elements of the residual set .

###### Definition 2.6 (layer decomposition).

Given a set system with SNC property, the layer decomposition of is a decomposition of into , where is the set of all -SNC elements in , and for , is the set of all -SNC elements in . We call as the layer depth of , and the elements in are said to have layer level .

The following result was proved in [2].

###### Lemma 2.7.

Given a set system and a constant , there exists a procedure which can test whether has -SNC property, and if it does, output the layer decomposition of . This procedure takes iterations and can be implemented in parallel on machines.

For any sub-collection with the form , denote by for . For any element , denote by the layer level of .

## 3 Parallel algorithm for MinWSC with -Snc

In this section, we give a parallel algorithm for MinWSC with -SNC property, using a primal-dual method.

### 3.1 Algorithm

MinWSC can be modeled as an integer program as follows, where indicates that set is picked and otherwise:

(2) |

The integer program (3.1

) can be relaxed to a linear program LP as follows:

(3) |

Its dual program is as follows:

(4) |

The algorithm consists of two phases forward phase (line 1 to 16 of Algorithm 1) and deletion phase (Algorithm 2). The forward phase is to construct a collection of subcollections which covers all elements. The deletion phase is to remove some redundant sets. In the deletion phase of [2], the authors defined a primal slackness property with parameter as follows: for any element with , . Their -approximation was derived by showing that . Our algorithm can reduce from to down to .

The forward phase essentially employs a primal-dual schema: starting from , dual variables are increased until some constraint for the dual LP becomes nearly tight, where a dual constraint corresponding to set (see (3.1)) is nearly tight if the remaining weight satisfies , at which time, all nearly tight sets are picked (line 11 and line 12 of Algorithm 1). Note that elements are dealt with layer by layer. Call each round of the for loop of Algorithm 1 as an epoch. The th epoch tries to cover all elements in using a collection of picked subcollections, where denotes the set of elements of

which are not covered by subcollections picked in previous epochs. In order to efficiently control the number of rounds, the increase of dual variables is increasing in a geometric manner (see line

7, 9, and 14 of Algorithm 1). After the forward phase, we get a feasible solution for MinWSC.The goal of the deletion phase is to ensure that the sub-collection output in line 17 of Algorithm 1 is a feasible solution of MinWSC satisfying the following property:

for any with , there are at most sets of covering . | (5) |

To realize this goal, elements are dealt with in reverse order from layer down to layer 1. When dealing with elements in layer , is kept to be a collection of sets covering

(6) |

Initially, . Some element with might be covered by more than sets of . So, we have to shrink to satisfy property 5. Because every element is a -SNC element of (by the definition of layer decomposition in Definition 2.6), the neighbors of in can be covered by a base group set consisting of at most sets. So, an idea is to select for each element a base group set from . However, there is a synchronous problem: it might happen that a set covering both and is picked into the base group set of , but is not picked into the base group set of , and thus it is still possible for to be covered by more than sets of the union of these base group sets. To avoid such an asynchronous problem, the algorithm finds base group sets for a set of maximal independent elements, where a set is a maximal independent set if no elements in are neighbors of each other and adding any element into destroys the independence property. To realize this idea, a sequence of auxiliary graphs and a sequence of maximal independent sets are constructed as follows: the vertices of are those elements in not adjacent with ,

two vertices of are adjacent if they are neighboring elements in , | (7) |

and is a maximal independent set of . From such a construction,

(8) |

For each element , find a base group set . In line 10 of Algorithm 2, is set to be the union of these base group sets. Note that

(9) |

In fact, if is a common set of and for , then are neighbors in , contradicting (8). We shall prove in Claim 1 of Lemma 3.2 that any element in are covered by at most sets of . This part is essentially the same as that in [2]. To further shrink to satisfy property (5), for each element , we flip a fair coin. If it is head, then a subcollection of consisting of sets is picked in a random manner. If it is tail, then all sets in are picked into . By (9),

how are taken are independent events. | (10) |

The role that plays is to help with finding a collection of sets which can be deleted from without affecting the covering requirement (see line 20 to 26). We shall prove in Lemma 3.2 that after rounds of the while loop, the elements in satisfy property (5) with high probability.

### 3.2 Analysis

The following lemma shows the feasibility of output by Algorithm 1.

###### Lemma 3.1.

The collection returned by Algorithm 1 is a set cover.

###### Proof.

We shall prove by induction on from down to that

in Algorithm 2, after processing layer , covers all elements of , | (11) |

where is defined in (6). Then the lemma follows from (11) for since .

First consider the inductive basis when . Initially , which covers . For any , if , then is covered by constructed in line 10. Otherwise , and by the maximality of , element is adjacent with an element . By the SNC property, base group set covers . In any case, in line 10 covers . Since line 21 to line 26 only removes redundant sets, in line 26 still covers all elements of . The inductive basis is proved.

The next lemma gives some important properties of .

###### Lemma 3.2.

The collection returned by Algorithm 1 satisfies the following properties:

with high probability, any is covered by at most sets of ;

for any , , where is the set of dual variables at the end of the algorithm.

###### Proof of property .

It suffices to prove that after the th epoch of Algorithm 2, which is the th round of the outer for loop,

any element in is covered by at most sets of with high probability. | (12) |

In fact, notice that for any , no set of can cover any element in (because of the definition of in line 6 of the forward phase). So, as long as property (12) can be proved after the th epoch, this property is maintained throughout processing layers , and thus property follows.

In the following, all labels of lines we mention refer to Algorithm 2. We first bound for any element , where is the collection in line 10.

Claim 1. For the collection in line 10, any element satisfies .

If , then those sets in are the only sets of covering . In fact, if there is a set covering , where is an element of which is different from , then covers both and , and thus and are neighbors in , contradicting that is an independent set of . It follows that

(13) |

Next, consider an element . We first prove that

(14) |

Denote by the set of elements in which are neighbors of in , and let an arbitrary set in (recall that is the collection of sets in containing both and ). Since is a -SNC element of , every set of contains , and elements in are neighbors of in , there exists a subcollection with size at most such that covers . If , then by the pigeonhole principle, there exists a set covering two elements of , contradicting that is independent. So, (14) is proved.

Notice that a set implies that and there exists an element such that . So, contains both and , and thus is a neighbor of in . Combining this with (14) and the fact that every base group set has size at most , we have

Claim 1 is proved.

Claim 2. Let , where is the collection of sets in line 10. At the end of the while loop of Algorithm 2, the updated satisfies that for any element , with high probability.

By (13), . For any , let be the event that is decreased by at least 1 after one while loop. We shall prove that probability satisfies

(15) |

To prove (15), consider collection in line 26 at the end of the -th while loop, and a base group set restricted to . The idea is: in the -th while loop, for the selected collection of sets , if , then by the SNC property, any set satisfies , meeting the condition in line 22, and thus can be deleted in line 26, resulting in event if . So in the following, we lower bound the probability of .

For each , there exists an element with , since every set in comes from a base group set of some . Let . Note that

any is a neighbor of . | (16) |

For , let be the event that remains to be in . Its probability

(17) |

Since and , we have . Consider a set , let be an element in with . For simplicity of statement, we only argue about the case when (the case can be obtained similarly with an even simpler argument). In the case , we have to consider the probability , which equals

Since , we have

Hence

(18) |

Since any element in (including ) is a neighbor of (see (16)), by (14), we have

(19) |

Since we have shown in (10) that how are taken are independent events, the events as well as the event are mutually independent. Combining this observation with (17), (18), (19), the probability

and inequality (15) follows.

Denote by the number of rounds for to decrease below . Combining Claim 1 and inequality (15), we have

By Markov’s inequality,

(20) |

Let be the event that after rounds. By inequality (20),

(21) |

By the union bound,

Claim 2 is proved. And then, by the argument at the beginning, property is proved. ∎

###### Proof of property .

Assume that the number of iterations in the th epoch of Algorithm 1 is . For the th iteration of the while loop in the th epoch, denote by the residual weight in line 10, the residual element set in line 13, and the current dual variable for element . Suppose is picked into in the th iteration of the