Recently, submodular optimization has attracted a lot of interest in machine learning and data mining, where it has been applied to a variety of problems including viral marketing , information gathering 8], etc.
In this paper, we study parallel algorithm for the minimum cost sumbodular cover problem (MinSMC). Given a monotone nondecreasing submodular function , a cost function , an integer , the goal of MinSMC is to find a subset with the minimum cost such that , where the cost of is . MinSMC has numerous applications, including data summarization , recommender systems , etc. For example, given a set of data, it is desirable to select a cheapest set of data whose utility meets a lower bound of requirement. A lot of commonly used utility functions exhibit submodularity, a natural diminishing returns property, leading MinSMC problems . For MinSMC, a centralized greedy algorithm  is known to have approximation ratio , where is the th Harmonic number and .
However, facing with massive data, sequential and centralized greedy method is impractical. Parallel methods have been proposed recently. The best known parallel algorithm for the unweighted MinSMC problem was presented by Fahrbach et al. in , which produces a solution of size at most in at most rounds, where is an optimal solution. Note that the algorithm in  only deals with the unweighted MinSMC problem. Furthermore, the approximation ratio is and might be as large as , while might be much smaller than . Such an observation motivates us to study NC parallel algorithm for the weighted MinSMC problem, trying to obtain approximation ratio arbitrarily close to .
1.1 Related Works
For the MinSMC problem, Wolsey  presented a greedy algorithm with approximation ratio , where .
Mirzasoleiman et al.  proposed a distributed algorithm for the un-weighted MinSMC problem called DisCOVER, which reduces the problem into a set of cardinality-constrained submodular maximization problems. Employing a greedy algorithm for the cardinality-constrained submodular maximization problem, for any fixed constant , DisCOVER can find a solution with size in rounds of messages, where denotes the number of machines. As noted in , it is strange that in this result, when the number of machines is increased, the number of of rounds will increase (rather than decrease). Then the authors in  improved the result to a distributed -approximation algorithm in at most rounds of messages, where is the number of elements. These algorithms have suboptimal adaptivity complexity because the summarization algorithm of the centralized machine is sequential. The number of rounds in the central machine can be . A parallel algorithm with a low adaptivity complexity was presented in  with approximation ratio at most in at most rounds.
For some special cases of the submodular cover problem, parallel algorithms have been studied recently. In particular, for the set cover problem (i.e., find a smallest subcollection of sets that covers all elements), Berger et al.  provided the first parallel algorithm with an approximation guarantee similar to that of the centralized greedy algorithm. They used bucketing technique to obtain a -approximation in rounds, where is the total sum of the sets’ size. Rajagopalan and Vazirani  improved the number of rounds to at the cost of a larger approximation ratio of . Blelloch et al.  further improved the results by obtaining a -approximation algorithm in rounds.
1.2 Our contributions and technical overview
In this paper, we design a parallel algorithm for MinSMC, achieving approximation ratio at most with probability at least , which runs rounds, where , and is a constant in . This is the first paper studying parallel algorithm for the weighted version of MinSMC. Furthermore, the approximation ratio in this paper is arbitrarily close to , while the -approximation in  only works for the cardinality version and might be much smaller than .
We have tried the following method for MinSMC. Iteratively call the parallel algorithm for the submodular maximization problem with the knapsack constraint in  until finding a feasible solution to MinSMC. Note that this method runs in logarithmic number of rounds and the approximation ratio is . To improve the ratio dependence on to a ratio dependence on needs more effort.
This paper combines the ideas of multi-layer bucket in , maximal nearly independent set in , and random sample in . Note that  and  deal with the set cover problem. Since submodular cover structure is much more complicated than set-cover structure, the methods in  and  cannot be directly used on MinSMC. When applied separately, both of them encounter some structural difficulties. The paper  deals with a cardinality budgeted version of the submodular maximization problem. To develop its idea to suit for the weighted version of MinSMC, new ideas have to be explored, specially on how to deal with weights.
2 Parallel Algorithm and Analysis for MinSMC
Definition 2.1 (submodular and monotone nondecreasing).
Given an element set , and a function , is submodular if for any ; is monotone nondecreasing if for any .
For any set , denote to be the marginal profit of over . Assume . In this paper, is always assumed to be an integer-valued, monotone nondecreasing, submodular function. It can be verified that for any , the marginal profit function is also a monotone nondecreasing, submodular function.
Definition 2.2 (Minimum Submodular Cover Problem (MinSMC)).
Given a monotone nondecreasing submodular function , a cost function , an integer , the goal of MinSMC is to find satisfying
Define a function as for any subset . When is a monotone nondecreasing submodular function, it can be verified that is also a monotone nondecreasing submodular function. Note that , and for the modified MinSMC problem
a set is feasible to (2) if and only if is feasible to (1). Hence problems (1) and (2) are equivalent in terms of approximability, that is, is an -approximate solution to problem (2) if and only if is an -approximate solution to problem (1). In the following, we concentrate on the modified MinSMC problem (2).
The concept of -maximal nearly independent set (-MaxNIS) plays a crucial role in the analysis of the parallel algorithm proposed in  for the minimum set cover problem. This paper uses a slightly different concept which only needs nearly independent property.
Definition 2.3 (-nearly independent set (-Nis)).
For a real number and a set , we say that a set is an -NIS with respect to and if satisfies the following nearly independent property:
The main algorithm is described in Algorithm 1. In line 1 to line 7, the instance is preprocessed, the purpose of which is to ensure that the modified instance satisfies , so that the number of rounds can be bounded by the input size, where and are the maximum and the minimum cost of elements, respectively. Sub-procedure MinSMC-Par (described in Algorithm 2) is called in line 8 of Algorithm 1.
Algorithm 2 (MinSMC-Par) deals with the modified instance . It divides the elements into buckets, first by marginal profit-to-cost ratio, then by marginal profit (see line 10 of Algorithm 2). Priority is given to those buckets with higher profit-to-cost ratio. For those buckets with the same profit-to-cost ratio, priority is given to those buckets with higher marginal profit. Algorithm 2 processes the buckets in decreasing priority. Note that after some sets are chosen, an element in a bucket of higher priority may drop into a bucket with lower priority. For each bucket, Algorithm 2 tries to find an -NIS using procedure NIS (described in Algorithm 3).
In the th while-loop of Algorithm 3, an -NIS with respect to is found. After while-loops, an -NIS with respect to is obtained. For each while-loop of Algorithm 3, a for-loop is used to guess the size of the -NIS with respect to . In the for-loop, a mean operation described in Algorithm 4 is called. As will be shown, if is correctly guessed, then , and the random set sampled in line 22 satisfies the property required by a nearly independent set. A set consisting of elements is abbreviated as a -set. When we say “select a -set from uniformly and randomly”, it means that elements are selected sequentially from until we have elements at hand. So, any specific -set appears with probability . Note that viewing as an ordered set will facilitate its selection as well as the probabilistic computations.
Algorithm 4 uses the mean value of function to measure the expected quality of a sampled set, where is a random indicator function defined as follows. Given two sets , a parameter , and a real number , for a random -set which is selected from uniformly and randomly, and an element which is drawn uniformly at random from ,
that is, if , and otherwise. As a convention,
|if , define .||(4)|
The next lemma shows that the expectation of is monotone non-increasing with respect to the sample size . Since what matters in this lemma is the sample size , we use for abbreviation of .
Given , suppose and are two integers with . Then
Assume . It can be calculated that
where the first inequality comes from the submodularity of function , and the last inequality holds because is sampled from and function is nonnegative. ∎
With probability at least , if , and if .
2.3 Performance analysis
The next lemma shows that the expected size of decreases exponentially, which implies that in Algorithm 3, a bucket will become empty in at most rounds.
The inequality is obvious if . In the following, assume .
By the assumption of this lemma, we have . Then by Lemma 2.5, with probability at least ,
Note that after is picked, is included into only when , also note that this term is zero if , so
It follows that
where the second inequality uses the observation (since ). Combining this with inequality 7, we have . Thus . The lemma is proved. ∎
For clarity of statement, we call the bucket in line 15 of Algorithm 2 as a subordinate bucket and the bucket as a primary bucket. The following lemma says that for any and , when line 14 of Algorithm 2 outputs set , the subordinate bucket becomes empty with a certain probability.
So, with probability at least , we have
Denote the event to be . We have proved . Using Markov’s inequality, . So, . The lemma is proved. ∎
The following corollary shows that when the inner while loop of Algorithm 2 halts, the primary bucket is empty with a certain probability.
with probability at least .
where is the set in line 23. For clarity of statement, denote the size of as . Inequality (8) is obviously true if or . Next, suppose . Note that line 22 of Algorithm 3 is executed after the for loop is jumped out. Further note that the jump-out must be because of line 17. In fact, if the number of iterations of the for loop has reached , then , and every in Algorithm 4 is , resulting in (see (4)), at which time the condition of line 17 is satisfied. In the previous round of the for loop, that is, when tries the value , we must have , and thus
by Lemma 2.5. Assume that , and for any , denote . By the monotonicity of , we have
By the definition of and Markov’s inequality,
where the last inequality comes from the choice of . By line 10 and line 14 of Algorithm 2, when the computation enters Algorithm 3, we have for any in the input set , with respect to the input parameter . It follows that holds for every . Combining this with (14), inequality 8 is proved.
Then, by the union bound, and similar to the proof of Corollary 2.8, with probability at least ,
Combining this with , with probability at least
where the last inequality comes from and . ∎
For simplicity of statement, we assume that every inner while loop is executed times. Denote by for any . The following corollary shows that the expected cost effectiveness of decreases geometrically.
For any , with probability at least .
By Lemma 2.9 and the union bound, with probability at least ,