Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for submodular maximization that does not evaluate the entire stream even once. By carefully subsampling each element of data stream, our algorithm enjoys the tightest approximation guarantees in various settings while having the smallest memory footprint and requiring the lowest number of function evaluations. More specifically, for a monotone submodular function and a p-matchoid constraint, our randomized algorithm achieves a 4p approximation ratio (in expectation) with O(k) memory and O(km/p) queries per element (k is the size of the largest feasible solution and m is the number of matroids used to define the constraint). For the non-monotone case, our approximation ratio increases only slightly to 4p+2-o(1). To the best or our knowledge, our algorithm is the first that combines the benefits of streaming and subsampling in a novel way in order to truly scale submodular maximization to massive machine learning problems. To showcase its practicality, we empirically evaluated the performance of our algorithm on a video summarization application and observed that it outperforms the state-of-the-art algorithm by up to fifty fold, while maintaining practically the same utility.

Authors

• 24 publications
• 60 publications
• 19 publications
09/18/2021

Streaming algorithms for Budgeted k-Submodular Maximization problem

Stimulated by practical applications arising from viral marketing. This ...
11/14/2018

Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to ...
10/27/2020

Simultaenous Sieves: A Deterministic Streaming Algorithm for Non-Monotone Submodular Maximization

In this work, we present a combinatorial, deterministic single-pass stre...
04/06/2021

The Power of Subsampling in Submodular Maximization

We propose subsampling as a unified algorithmic technique for submodular...
05/31/2016

Horizontally Scalable Submodular Maximization

A variety of large-scale machine learning problems can be cast as instan...
10/20/2020

Very Fast Streaming Submodular Function Maximization

Data summarization has become a valuable tool in understanding even tera...
03/30/2020

The One-way Communication Complexity of Submodular Maximization with Applications to Streaming and Robustness

We consider the classical problem of maximizing a monotone submodular fu...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Submodularity characterizes a wide variety of discrete optimization problems that naturally occur in machine learning and artificial intelligence

(Bilmes and Bai, 2017). Of particular interest is submodular maximization, which captures many novel instances of data summarization such as active set selection in non-parametric learning (Mirzasoleiman et al., 2016b), image summarization (Tschiatschek et al., 2014), corpus summarization (Lin and Bilmes, 2011), fMRI parcellation (Salehi et al., 2017), and removing redundant elements from DNA sequencing (Libbrecht et al., 2018), to name a few.

Often the collection of elements to be summarized is generated continuously, and it is important to maintain at real time a summary of the part of the collection generated so far. For example, a surveillance camera generates a continuous stream of frames, and it is desirable to be able to quickly get at every given time point a short summary of the frames taken so far. The naïve way to handle such a data summarization task is to store the entire set of generated elements, and then, upon request, use an appropriate offline submodular maximization algorithm to generate a summary out of the stored set. Unfortunately, this approach is usually not practical both because it requires the system to store the entire generated set of elements and because the generation of the summary from such a large amount of data can be very slow. These issues have motivated previous works to use streaming submodular maximization algorithms for data summarization tasks (Gomes and Krause, 2010; Badanidiyuru et al., 2014; Mirzasoleiman et al., 2017b).

The first works (we are aware of) to consider a one-pass streaming algorithm for submodular maximization problems were the work of Badanidiyuru et al. (2014), who described a -approximation streaming algorithm for maximizing a monotone submodular function subject to a cardinality constraint, and the work of Chakrabarti and Kale (2015) who gave a -approximation streming algorithm for maximizing such functions subject to the intersection of matroid constraints. The last result was later extended by Chekuri et al. (2015) to -matchoids constraints. For non-monotone submodular objectives, the first streaming result was obtained by Buchbinder et al. (2015), who described a randomized streaming algorithm achieving -approximation for the problem of maximizing a non-monotone submodular function subject to a single cardinality constraint. Then, Chekuri et al. (2015) described an algorithm of the same kind achieving -approximation for the problem of maximizing a non-monotone submodular function subject to a -matchoid constraint, and a deterministic streaming algorithm achieving -approximation for the same problem.111The algorithms of Chekuri et al. (2015) use an offline algorithm for the same problem in a black box fashion, and their approximation ratios depend on the offline algorithm used. The approximation ratios stated here assume the state-of-the-art offline algorithms of (Feldman et al., 2017) which were published only recently, and thus, they are better than the approximation ratios stated by Chekuri et al. (2015). Finally, very recently, Mirzasoleiman et al. (2017a) came up with a different deterministic algorithm for the same problem achieving an approximation ratio of .

In the field of submodular optimization, it is customary to assume that the algorithm has access to the objective function and constraint through oracles. In particular, all the above algorithms assume access to a value oracle that given a set returns the value of the objective function for this set, and to an independence oracle that given a set and an input matroid answers whether is feasible or not in that matroid. Given access to these oracles, the algorithms of Chakrabarti and Kale (2015) and Chekuri et al. (2015) for monotone submodular objective functions are quite efficient, requiring only memory ( is the size of the largest feasible set) and using only value and independence oracle queries for processing a single element of the stream ( is a the number of matroids used to define the -matchoid constraint). However, the algorithms developed for non-monotone submodular objectives are much less efficient (see Table 1 for their exact parameters).

In this paper, we describe a new randomized streaming algorithm for maximizing a submodular function subject to a -matchoid constraint. Our algorithm obtains an improved approximation ratio of , while using only memory and value and independence oracle queries (in expectation) per element of the stream, which is even less than the number of oracle queries used by the state-of-the-art algorithm for monotone submodular objectives. Moreover, when the objective function is monotone, our algorithm (with slightly different parameter values) achieves an improved approximation ratio of using the same memory and oracle query complexities, i.e., it matches the state-of-the-art algorithm for monotone objectives in terms of the approximation ratio, while improving over it in terms of the number of value and independence oracle queries used. Additionally, we would like to point out that our algorithm also works in the online model with preemption suggested by Buchbinder et al. (2015) for submodular maximization problems. Thus, our result for non-monotone submodular objectives represents the first non-trivial result in this model for such objectives for any constraint other than a single matroid constraint. For a single matroid constraint, an approximation ratio of (which improves to for cardinality constraints) was given by Chan et al. (2017), and our algorithm improves it to since a single matroid is equivalent to -matchoid.

In addition to mathematically analyzing our algorithm, we also studied its practical performance in a video summarization task. We observed that, while our algorithm preserves the quality of the produced summaries, it outperforms the running time of the state-of-the-art algorithm by an order of magnitude. We also studied the effect of imposing different -matchoid constraints on the video summarization.

The work on (offline) maximizing a monotone submodular function subject to a matroid constraint goes back to the classical result of Fisher et al. (1978), who showed that the natural greedy algorithm gives an approximation ratio of for this problem. Later, an algorithm with an improved approximation ratio of was found for this problem (Călinescu et al., 2011), which is the best that can be done in polynomial time (Nemhauser and Wolsey, 1978). In contrast, the corresponding optimization problem for non-monotone submodular objectives is much less well understood. After a long series of works (Lee et al., 2010a; Vondrák, 2013; Oveis Gharan and Vondrák, 2011; Feldman et al., 2011a; Ene and Nguyen, 2016), the current best approximation ratio for this problem is (Buchbinder and Feldman, 2016), which is still far from the state-of-the-art inapproximability result of for this problem due to (Oveis Gharan and Vondrák, 2011).

Several works have considered (offline) maximization of both monotone and non-monotone submodular functions subject to constraint families generalizing matroid constraints, including intersection of -matroid constraints (Lee et al., 2010b), -exchange system constraints (Feldman et al., 2011b; Ward, 2012), -extendible system constraints (Feldman et al., 2017) and -systems constraints (Fisher et al., 1978; Gupta et al., 2010; Mirzasoleiman et al., 2016a; Feldman et al., 2017). We note that the first of these families is a subset of the -matchoid constraints studied by the current work, while the last two families generalize -matchoid constraints. Moreover, the state-of-the-art approximation ratios for all these families of constraints are both for monotone and non-monotone submodular objectives.

The study of submodular maximization in the streaming setting has been mostly surveyed above. However, we would like to note that besides the above mentioned results, there are also a few works on submodular maximization in the sliding window variant of the streaming setting (Chen et al., 2016; Epasto et al., 2017; Wang et al., 2017).

1.2 Our Technique

Technically, our algorithm is equivalent to dismissing every element of the stream with an appropriate probability, and then feeding the elements that have not been dismissed into the deterministic algorithm of

(Chekuri et al., 2015) for maximizing a monotone submodular function subject to a -matchoid constraint. The random dismissal of elements gives the algorithm two advantages. First, it makes it faster because there is no need to process the dismissed elements. Second, it is well known that such a dismissal often transforms an algorithm for monotone submodular objectives into an algorithm with some approximation guarantee also for non-monotone objectives. However, beside the above important advantages, dismissing elements at random also have an obvious drawback, namely, the dismissed elements are likely to include a significant fraction of the value of the optimal solution. The crux of the analysis of our algorithm is its ability to show that the above mentioned loss of value due to the random dismissal of elements does not affect the approximation ratio. To do so, we prove a stronger version of a structural lemma regarding graphs and matroids that was implicitly proved by (Varadaraja, 2011) and later stated explicitly by (Chekuri et al., 2015). The stronger version we prove translates into an improvement in the bound on the performance of the algorithm, which is not sufficient to improve the guaranteed approximation ratio, but fortunately, is good enough to counterbalance the loss due to the random dismissal of elements.

We would like to note that the general technique of dismissing elements at random, and then running an algorithm for monotone submodular objectives on the remaining elements, was previously used by (Feldman et al., 2017) in the context of offline algorithms. However, the method we use in this work to counterbalance the loss of value due to the random dismissal of streaming elements is completely unrelated to the way this was achieved in (Feldman et al., 2017).

2 Preliminaries

In this section, we introduce some notation and definitions that we later use to formally state our results. A set function on a ground set is non-negative if for every , monotone if for every and submodular if for every . Intuitively, a submodular function is a function that obeys the property of diminishing returns, i.e., the marginal contribution of adding an element to a set diminishes as the set becomes larger and larger. Unfortunately, it is somewhat difficult to relate this intuition to the above (quite cryptic) definition of submodularity, and therefore, a more friendly equivalent definition of submodularity is often used. However, to present this equivalent definition in a simple form, we need some notation. Given a set and an element , we denote by and the union and the expression , respectively. Additionally, the marginal contribution of to the set under the set function is written as . Using this notation, we can now state the above mentioned equivalent definition of submodularity, which is that a set function is submodular if and only if

 f(u∣S)≥f(u∣T)∀S⊆T⊆\cN and u∈\cN∖T.

Occasionally, we also refer to the marginal contribution of a set to a set (under a set function ), which we write as .

A set system is a pair , where is the ground set of the set system and is the set of independent sets of the set system. A matroid is a set system which obeys three properties: (i) the empty set is independent, (ii) if and is independent, then so is , and finally, (iii) if and are two independent sets obeying , then there exists an element such that is independent. In the following lines we define two matroid related terms that we use often in our proofs, however, readers who are not familiar with matroid theory should consider reading a more extensive presentation of matroids, such as the one given by (Schrijver., 2003, Volume B). A cycle of a matroid is an inclusion-wise minimal dependent set, and an element is spanned by a set if the maximum size independent subsets of and are of the same size. Note that it follows from these definitions that every element of a cycle is spanned by .

A set system is a -matchoid, for some positive integer , if there exist matroids such that every element of appears in the ground set of at most out of these matroids and . A simple example for a -matchoid is -matching. Recall that a set of edges of a graph is a -matching if and only if every vertex of the graph is hit by at most edges of , where is a function assigning integer values to the vertices. The corresponding -matchoid has the set of edges of the graph as its ground set and a matroid for every vertex of the graph, where the matroid of a vertex of the graph has in its ground set only the edges hitting and a set of edges is independent in if and only if . Since every edge hits only two vertices, it appears in the ground sets of only two vertex matroids, and thus, is indeed a -matchoid. Moreover, one can verify that a set of edges is independent in if and only if it is a valid -matching.

The problem of maximizing a set function subject to a -matchoid constraint asks us to find an independent set maximizing . In the streaming setting we assume that the elements of arrive sequentially in some adversarially chosen order, and the algorithm learns about each element only when it arrives. The objective of an algorithm in this setting is to maintain a set which approximately maximizes , and to do so with as little memory as possible. In particular, we are interested in algorithms whose memory requirement does not depend on the size of the ground set , which means that they cannot keep in their memory all the elements that have arrived so far. Our two results for this setting are given by the following theorems. Recall that is the size of the largest independent set and is the number of matroids used to define the -matchoid constraint.

Theorem 1.

There is a streaming -approximation algorithm for the problem of maximizing a non-negative monotone submodular function subject to a -matchoid constraint whose space complexity is . Moreover, in expectation, this algorithm uses value and independence oracle queries when processing each arriving element.

Theorem 2.

There is a streaming -approximation algorithm for the problem of maximizing a non-negative submodular function subject to a -matchoid constraint whose space complexity is . Moreover, in expectation, this algorithm uses value and indpenence oracle queries when processing each arriving element.

3 Algorithm

In this section we prove Theorems 1 and 2. Throughout this section we assume that is a non-negative submodular function over the ground set , and is a -matchoid over the same ground set which is defined by the matroids . Additionally, we denote by the elements of in the order in which they arrive. Finally, for an element and sets , we use the shorthands and . Intuitively, is the marginal contribution of to the part of that arrived before itself. One useful property of this shorthand is given by the following observation.

Observation 3.

For every two sets , .

Proof.

Let us denote the elements of by , where . Then,

 f(T∣S∖T)= |T|∑j=1f(uij∣(S∪T)∖{uij,uij+1…,ui|T|}) ≤ |T|∑j=1f(uij∣S∖{uij,uij+1…,un}) = |T|∑j=1f(uij∣S∩{u1,u2,…,uij−1})=|T|∑j=1f(uij:S)=f(T:S),

where the inequality follows from the submodularity of . ∎

Let us now present the algorithm we us to prove our results. This algorithm uses a procedure named Exchange-Candidate which appeared also in previous works, sometimes under the exact same name. Exchange-Candidate gets an independent set and an element , and its role is to output a set such that is independent. The pseudocode of Exchange-Candidate is given as Algorithm 1.

Using the procedure Exchange-Candidate, we can now write our own algorithm, which is given as Algorithm 2. This algorithm has two parameters, a probability and a value . Whenever the algorithm gets a new element , it dismisses it with probability . Otherwise, the algorithm finds using Exchange-Candidate a set of elements whose removal from the current solution maintained by the algorithm allows the addition of to this solution. If the marginal contribution of adding to the solution is large enough compared to the value of the elements of , then is added to the solution and the elements of are removed. While reading the pseudocode of the algorithm, keep in mind that represents the solution of the algorithm after elements have been processed.

Observation 4.

Algorithm 2 can be implemented using memory and, in expectation, value and independence oracle queries per arriving element.

Proof.

An implementation of Algorithm 2 has to keep in memory at every given time point only three sets: , and . Since these sets are all subsets of independent sets, each one of them contains at most elements, and thus, memory suffices for the algorithm.

An arriving element which is dismissed immediately (which happens with probability ) does not require any value and independence oracle queries. The remaining elements require such queries, and thus, in expectation an arriving element requires oracle queries. ∎

Algorithm 2 adds an element to its solution if two things happen: (i) is not dismissed due to the random decision and (ii) the marginal contribution of with respect to the current solution is large enough compared to the value of . Since checking (ii) requires more resources then checking (i), the algorithm checks (i) first. However, for analyzing the approximation ratio of Algorithm 2, it is useful to assume that (ii) is checked first. Moreover, for the same purpose, it is also useful to assume that the elements that pass (ii) but fail (i) are added to a set . The algorithm obtained after making these changes is given as Algorithm 3. One should note that this algorithm has the same output distribution as Algorithm 2, and thus, the approximation ratio we prove for the first algorithm applies to the second one as well.

Let us denote by the set of elements that ever appeared in the solution maintained by Algorithm 3—formally, . The following lemma and corollary show that the elements of cannot contribute much to the output solution of Algorithm 3, and thus, their absence from does not make much less valuable than .

.

Proof.

Fix an element , then

 f(Si)−f(Si−1)= f(Si−1∖Ui+ui)−f(Si−1)=f(ui∣Si−1∖Ui)−f(Ui∣Si−1∖Ui) (1) ≥ f(ui∣Si−1)−f(Ui:Si−1)≥c⋅f(Ui:Si−1),

where the first inequality follows from the submodularity of and Observation 3, and the second inequality holds since the fact that Algorithm 3 accepted into its solution implies .

We now observe that every element of must have been removed exactly once from the solution of Algorithm 3, which implies that is a disjoint partition of . Using this observation, we get

 f(A∖Sn:Sn)=∑ui∈Af(Ui:Sn)≤∑ui∈Af(Si)−f(Si−1)c=f(Sn)−f(∅)c≤f(Sn)c,

where the first inequality follows from Inequality (1), the second equality holds since whenever and the second inequality follows from the non-negativity of . ∎

.

Proof.

Since by definition,

 f(A)= f(A∖Sn∣Sn)+f(Sn)≤f(A∖Sn:Sn)+f(Sn) ≤ f(Sn)c+f(Sn)=c+1c⋅f(Sn),

where the first inequality follows from Observation 3 and the second from Lemma 5. ∎

Our next goal is to show that the value of the elements of the optimal solution that do not belong to is not too large compared to the value of itself. To do so, we need a mapping from the elements of the optimal solution to elements of . Such a mapping is given by Proposition 8. However, before we get to this proposition, let us first present Reduction 7, which simplifies Proposition 8.

Reduction 7.

For the sake of analyzing the approximation ratio of Algorithm 3, one may assume that every element belongs to exactly out of the ground sets of the matroids defining .

Proof.

For every element that belongs to the ground sets of only out of the matroids , we can add to additional matroids as a free element (i.e., an element whose addition to an independent set always keeps the set independent). On can observe that the addition of to these matroids does not affect the behavior of Algorithm 3 at all, but makes obey the technical property of belonging to exactly out of the ground sets . ∎

From this point on we implicitly make the assumption allowed by Reduction 7. In particular, the proof of Proposition 8 relies on this assumption.

Proposition 8.

For every set which does not include elements of , there exists a mapping from elements of to multi-subsets of such that

• every element appears at most times in the multi-sets of .

• every element appears at most times in the multi-sets of .

• every element obeys .

• every element obeys for every , and the multi-set contains exactly elements (including repetitions).

The proof of Proposition 8 is quite long and involves many details, and thus, we defer it to Section 3.1. Instead, let us prove now a very useful technical observation. To present this observation we need some additional definitions. Let . Additionally, for every , we define

 d(i)={1+max{i≤j≤n∣ui∈Sj}if ui∈A,iotherwise.

In general, is the index of the element whose arrival made Algorithm 3 remove from its solution. Two exceptions to this rule are as follows. If was never added to the solution, then ; and if was never removed from the solution, then .

Observation 9.

Consider an arbitrary element .

• If , then for every . In particular, since ,

• .

Proof.

To see why the first part of the observation is true, consider an arbitrary element . Then,

 0≤f(ui∣Si−1)≤f(u∣Si′∩{u1,u2,…,ui−1})=f(u:Si′),

where the second inequality follows from the submodularity of and the inclusion (which holds because elements are only added by Algorithm 3 to its solution at the time of their arrival).

It remains to prove the second part of the observation. Note that Algorithm 3 adds every arriving element to at most one of the sets and , and thus, these sets are disjoint; hence, to prove the observation it is enough to show that and are also disjoint. Assume towards a contradiction that this is not the case, and let be the first element to arrive which belongs to both and . Then,

 f(ui∣Si−1)≥(1+c)⋅f(Ui:Si−1)=(1+c)⋅∑uj∈Uif(uj:Sd(j)−1).

To see why that inequality leads to a contradiction, notice its leftmost hand side is negative by our assumption that , while its rightmost hand side is non-negative by the first part of this observation since the choice of implies that no element of can belong to . ∎

Using all the tools we have seen so far, we are now ready to prove the following theorem. Let be an independent set of maximizing .

Assuming , .

Proof.

Since for every , the submodularity of guarantees that

 f(A∪OPT)≤ f(A)+∑ui∈OPT∖(R∪A)f(ui∣A)+∑ui∈(OPT∖A)∩Rf(ui∣A) ≤ f(A)+∑ui∈OPT∖(R∪A)f(ui∣Si−1)+∑ui∈(OPT∖A)∩Rf(ui∣Si−1) ≤

where the third inequality follows from Corollary 6 and the fact that by Observation 9. Let us now consider the function whose existence is guaranteed by Proposition 8 when we choose . Then, the property guaranteed by Proposition 8 for elements of implies

 ∑ui∈OPT∖(R∪A)f(ui∣Si−1)≤(1+c)⋅∑ui∈OPT∖(R∪A)uj∈ϕOPT∖R(ui)f(uj:Sd(j)−1).

 ∑ui∈OPT∖(R∪A)uj∈ϕOPT∖R(ui)f(uj:Sd(j)−1)+p⋅∑ui∈OPT∩Af(ui∣Si−1)≤∑ui∈OPT∖Ruj∈ϕOPT∖R(ui)f(uj:Sd(j)−1) ≤ ≤ p⋅f(Sn)+p−1c⋅f(Sn)=(1+c)⋅p−1c⋅f(Sn),

where the first inequality follows from the properties guaranteed by Proposition 8 for elements of (note that the sets and are a disjoint partition of by Observation 9) and the second inequality follows from the properties guaranteed by Proposition 8 for elements of and because every element in the multisets produced by belongs to , and thus, obeys by Observation 9. Finally, the last inequality follows from Lemma 5 and the fact that for every . Combining all the above inequalities, we get

 f(A ∪OPT)≤1+cc⋅f(Sn)+ (1+c)⋅⎡⎣(1+c)⋅p−1c⋅f(Sn)−p⋅∑ui∈OPT∩Af(ui∣Si−1)⎤⎦+∑ui∈OPT∩Rf(ui∣Si−1) = (2)

By the linearity of expectation, to prove the theorem it only remains to show that the expectations of the last two terms on the rightmost hand side of Inequality (3) are equal. This is our objective in the rest of this proof. Consider an arbitrary element . When arrives, one of two things happens. The first option is that Algorithm 3 discards without adding it to either its solution or to . The other option is that Algorithm 3 adds to its solution (and thus, to ) with probability , and to with probability . The crucial observation here is that at the time of ’s arrival the set is already determined, and thus, this set is independent of the decision of the algorithm to add to or to ; which implies the following equality (given an event , we use here to denote an indicator for it).

 \bE[1[ui∈A]⋅f(ui∣Si−1)]q=\bE[1[ui∈R]⋅f(ui∣Si−1)]1−q.

Rearranging the last equality, and summing it up over all elements , we get

 1−qq⋅\bE⎡⎣∑ui∈OPT∩Anf(ui∣Si−1)⎤⎦=\bE⎡⎣∑ui∈OPT∩Rf(ui∣Si−1)⎤⎦.

Recall that we assume , which implies . Plugging this equality into the previous one completes the proof that the expectations of the last two terms on the rightmost hand side of Inequality (3) are equal. ∎

Proving our result for monotone functions (Theorem 1) is now straightforward.

Proof of Theorem 1.

By plugging and into Algorithm 2, we get an algorithm which uses memory and oracle queries by Observation 4. Additionally, by Theorem 10, this algorithm obeys

 \bE[f(Sn)]≥c(1+c)2p⋅\bE[f(A∪OPT)]=14p⋅\bE[f(A∪OPT)]≥14p⋅f(OPT),

where the second inequality follows from the monotonicity of . Thus, the approximation ratio of the algorithm we got is at most . ∎

Proving our result for non-monotone functions is a bit more involved. First, we need the following known lemma.

Lemma 11 (Lemma 2.2 of (Buchbinder et al., 2014)).

Let be a non-negative submodular function, and let be a random subset of containing every element of with probability at most (not necessarily independently), then .

The proof of Theorem 2 is now very similar to the above presented proof of Theorem 1, except that slightly different values for and are used, and in addition, Lemma 11 is now used to lower bound instead of the monotonicity of the objective that was used for that purpose in the proof of Theorem 1. A more detailed presentation of this proof is given below.

Proof of Theorem 2.

By plugging and into Algorithm 2, we get an algorithm which uses memory and oracle queries by Observation 4. Additionally, by Theorem 10, this algorithm obeys

 \bE[f(Sn)]≥c(1+c)2p⋅\bE[f(A∪OPT)].

Let us now define to be the function . Note that is non-negative and submodular. Thus, by Lemma 11 and the fact that contains every element with probability at most (because Algorithm 2 accepts an element into its solution with at most this probability), we get

 \bE[f(A∪OPT)]= \bE[g(A)]≥(1−q)⋅g(∅) = (1−q)⋅f(OPT)=p+√p(p+1)p+√p(p+1)+1⋅f(OPT) = p+√p(p+1)√1+1/p⋅(p+√p(p+1))⋅f(OPT)=1c⋅f(OPT).

Combining the two above inequalities, we get

 \bE[f(Sn)]≥ 1(1+c)2p⋅f(OPT)=1(2+2√1+1/p+1/p)p⋅f(OPT) = 12p+2√p(p+1)+1⋅f(OPT)

Thus, the approximation ratio of the algorithm we got is at most . ∎

3.1 Proof of Proposition 8

In this section we prove Propsition 8. Let us first restate the proposition itself.

Proposition 8.

For every set which does not include elements of , there exists a mapping from elements of to multi-subsets of such that

• every element appears at most times in the multi-sets of .

• every element appears at most times in the multi-sets of .

• every element obeys .

• every element obeys for every , and the multi-set contains exactly elements (including repetitions).

We begin the proof of Proposition 8 by constructing graphs, one for every one of the matroids defining . For every , the graph contains two types of vertices: its internal vertices are the elements of , and its external vertices are the elements of . Informally, the external elements of are the elements of which were rejected upon arrival by Algorithm 3 and the matroid can be (partially) blamed for this rejection. The arcs of are created using the following iterative process that creates some arcs of in response to every arriving element. For every , consider the element selected by the execution of Exchange-Candidate on the element and the set . From this point on we denote this element by . If no element was selected by the above execution of Exchange-Candidate, or , then no arcs are created in response to . Otherwise, let be the single cycle of the matroid in the set —there is exactly one cycle of in this set because is independent, but is not independent in . One can observe that is equal to the set in the above mentioned execution of Exchange-Candidate, and thus, . We now denote by the vertex out of that does not belong to —notice that there is exactly one such vertex since , which implies that it appears in if and does not appear in if . Regardless of the node chosen as , the arcs of created in response to are all the possible arcs from to the other vertices of . Observe that these are valid arcs for in the sense that their end points (i.e., the elements of ) are all vertices of —for the elements of this is true since , and for the element this is true since the existence of implies .

Some properties of are given by the following observation. Given a graph and a vertex , we denote by the set of vertices to which there is a direct arc from in .

Observation 12.

For every ,

• every non-sink vertex of is spanned by the set .

• for every two indexes , if and both exist and , then .

• is a directed acyclic graph.

Proof.

Consider an arbitrary non-sink node of . Since there are arcs leaving , must be equal to for some . This implies that belongs to the cycle , and that there are arcs from to every other vertex of . Thus, is spanned by the vertices of because the fact that is a cycle containing implies that spans . This completes the proof of the first part of the observation.

Let us prove now a very useful technical claim. Consider an index such that exists, and let be an arbitrary value . We will prove that does not belong to . By definition, is either or the vertex that belongs to , and thus, arrived before and is not equal to ; hence, in neither case . Moreover, combining the fact that is either or arrived before and the observation that is never a part of , we get that cannot belong to , which implies the claim together with out previous observation that .

The technical claim that we proved above implies the second part of the lemma, namely that for every two indexes , if and both exist and , then . To see why that is the case, assume without loss of generality . Then, the above technical claim implies that , which implies because .

At this point, let us assume towards a contradiction that the third part of the observation is not true, i.e., that there exists a cycle in . Since every vertex of has a non-zero out degree, every such vertex must be equal to for some . Thus, there must be indexes such that contains an arc from to . Since we already proved that cannot be equal to for any , the arc from to must have been created in response to , hence, , which contradicts the technical claim we have proved. ∎

One consequence of the properties of proved by the last observation is given by the following lemma. A slightly weaker version of this lemma was proved implicitly by Varadaraja (2011), and was stated as an explicit lemma by Chekuri et al. (2015).

Lemma 13.

Consider an arbitrary directed acyclic graph whose vertices are elements of some matroid . If every non-sink vertex of is spanned by in , then for every set of vertices of