# Streaming Based Bicriteria Approximation Algorithms for Submodular Optimization

This paper proposes the optimization problem Non-Monotone Submodular Cover (SCP), which is to minimize the cost required to ensure that a non-monotone submodular benefit function exceeds a given threshold. Two algorithms are presented for SCP that both give a bicriteria approximation guarantee to the problem. Both algorithms process the ground set in a stream, one in multiple passes. Further, a bicriteria approximation algorithm is given for the related Non-Monotone Submodular Maximization subject to a knapsack constraint optimization problem.

## Authors

• 5 publications
05/31/2021

### On maximizing a monotone k-submodular function under a knapsack constraint

We study the problem of maximizing a monotone k-submodular function f un...
07/15/2021

### Multilinear extension of k-submodular functions

A k-submodular function is a function that given k disjoint subsets outp...
09/18/2021

### Streaming algorithms for Budgeted k-Submodular Maximization problem

Stimulated by practical applications arising from viral marketing. This ...
08/02/2019

### An Efficient Evolutionary Algorithm for Minimum Cost Submodular Cover

In this paper, the Minimum Cost Submodular Cover problem is studied, whi...
08/01/2019

### Submodular Cost Submodular Cover with an Approximate Oracle

In this work, we study the Submodular Cost Submodular Cover problem, whi...
11/30/2017

### Submodular Maximization through the Lens of Linear Programming

The simplex algorithm for linear programming is based on the fact that a...
12/14/2020

### Minimum Robust Multi-Submodular Cover for Fairness

In this paper, we study a novel problem, Minimum Robust Multi-Submodular...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A function defined on subsets of a ground set of size is submodular if it possesses the following property: For all and ,

. Submodular set functions are found in many applications in data mining and machine learning including data summarization

(Barezi et al., 2019; Xu et al., 2015; Tschiatschek et al., 2014), influence maximization in a social network (Kempe et al., 2003), dictionary selection (Das and Kempe, 2011), monitor placement (Leskovec et al., 2007), as well as in classic optimization problems such as maximum weighted cut, set cover, and facility location. Further, many applications involve non-monotone111 is monotone if for all , . submodular functions (Barezi et al., 2019; Xu et al., 2015; Tschiatschek et al., 2014). The simplest optimization problem involving submodular functions is the NP-hard unconstrained submodular maximization problem, where we wish to find . On the other hand, the constrained submodular maximization problem has received a lot of attention, where we maximize subject to some constraint. For example, the cardinality constraint (find ) (Buchbinder et al., 2014), or a knapsack constraint () (Gupta et al., 2010), both NP-hard. However, in some applications it is desirable that rather than maximizing the submodular function, we only wish that the submodular function be sufficiently good and then minimize some other value. For example, in the data summarization application one may wish to find a sufficiently good summary (i.e. get a submodular function above a threshold) while minimizing the total memory of the summary. Motivated by this, we introduce here the optimization problem Non-Monotone Submodular Cover (SC).

###### Definition 1 (Non-Monotone Submodular Cover (SC)).

Let be a submodular function defined over subsets of the universe of size such that , and let be a cost function defined over the elements of the universe. The Non-Monotone Submodular Cover problem (SC) is, given , find

 argminX⊆U{∑x∈Xw(x):f(X)≥τ}.

An instance of SC is written as SC where is the cost of the optimum solution.

SC has been considered in the special setting where is also assumed to be monotone (Wolsey, 1982a), but to the best of our knowledge has never been considered in the non-monotone setting. The topic of this paper is therefore to consider whether approximation algorithms can be developed for this problem, and further whether those algorithms can be made practical in the face of large data sets. In particular, the streaming model is considered: is assumed to arrive in an arbitrary order, and the goal is to solve SC so that very few passes are made through the entire data set and in addition memory used at any point in time is limited.

### 1.1 Contributions

In particular, the contributions are:

• [noitemsep]

• It is proven in Theorem 1 that we cannot approximate the constraint of SC better than in polynomially many queries of (under some complexity assumptions). On the other hand, two bicriteria approximation algorithms that can yield constraint approximations of arbitrarily close to are presented (Multi-Pass-Cover and Single-Pass-Cover).

• The algorithm Multi-Pass-Cover is proposed for SC, which is a multi-pass streaming approximation algorithm with a bicriteria approximation guarantee of , where is the approximation ratio of an unconstrained submodular maximization algorithm used as a subroutine (which can be 1/2 by using the algorithm of Buchbinder and Feldman (2018)). Multi-Pass-Cover takes at most passes through in an arbitrary order, while storing elements of total weight at most .

• The algorithm Single-Pass-Cover is proposed for SC, which takes a single pass through in an arbitrary order and has a bicriteria approximation guarantee of . The total weight of all elements stored at one time is at most , where is an input. Single-Pass-Cover does not have a bound on the total weight of stored elements relative to , which is in fact shown to be impossible. Instead, Single-Pass-Cover has a bound on the total weight of stored elements relative to , the optimum solution over the first elements read in, provided it exists: Once the th element of the universe has been read in by Single-Pass-Cover, the total weight of all elements stored at one time at most from that point on, where is instance dependent.

• A similar approach to Stream can be taken for the related problem SK (Submodular Knapsack). In particular, the algorithm Single-Pass-Knapsack is proposed for SK. Single-Pass-Knapsack takes a single pass through in an arbitrary order and has a bicriteria approximation guarantee of . The total weight of all stored elements is at most .

### 1.2 Notation and Definitions

The following notation and definitions will be used throughout the paper: (i) Define for to be ; (ii) Define ; (iii) Define .

### 1.3 Related Work

#### Unconstrained Submodular Maximization

SC is related to the unconstrained submodular maximization problem. In particular, if , then SC produces a solution to the unconstrained submodular maximization problem. It was proven by Feige et al. (2011) that given any instance of unconstrained submodular maximization with optimum solution , there is no algorithm using fewer than queries that always finds a solution of expected value at least for any (even if the objective is assumed to have a symmetric function). This implies Theorem 1 stated in this paper. A number of approximation algorithms have been proposed for unconstrained submodular maximization (Feige et al., 2011; Buchbinder et al., 2015; Dobzinski and Mor, 2015; Buchbinder and Feldman, 2018; Ene et al., 2018; Chen et al., 2019). One approach is via local search algorithms (Feige et al., 2011; Gharan and Vondrák, 2011; Dobzinski and Mor, 2015), of which the best deterministic approximation ratio is 0.4 (Dobzinski and Mor, 2015) and the best randomized is 0.41 (Gharan and Vondrák, 2011). On the other hand, Buchbinder et al. (2015) proposed a deterministic 1/3, and a randomized 1/2 algorithm that are more of a greedy approach, both running in linear time. Interestingly, a random set is a 1/4 approximation in expectation (Feige et al., 2011). Recently, Buchbinder and Feldman (2018) introduced a method of de-randomizing algorithms, which could be applied to get a 1/2 guarantee in time, or alternatively a guarantee in time. Chen et al. (2019) introduced a constant adaptivity algorithm with a randomized guarantee, based on the multilinear extension. Independently, the same result was found by Ene et al. (2018). All of the unconstrained submodular maximization algorithms require the entire ground set to be stored in memory. To the best of our knowledge, none of the above algorithms for unconstrained submodular cover have been shown to give an approximation guarantee for SC. For some of them, it is easy to see that they do not give a non-trivial approximation guarantee for SC: Both the local search algorithm of Feige et al. (2011) and the double greedy algorithm of Buchbinder et al. (2015) can return solutions that have times the cost of that of the optimal.

#### Submodular Cover

Variants of Submodular Cover have been studied where is assumed to be monotone (Wolsey, 1982b; Wan et al., 2010; Crawford et al., 2019). In particular, the greedy algorithm produces a -approximate solution, where and are instance dependent parameters (Wolsey, 1982b). In addition, a slightly modified greedy algorithm produces a -bicriteria approximation ratio. To the best of our knowledge, a non-monotone version of submodular cover has never been proposed. Monotone submodular cover with cardinality cost has been studied previously in the streaming setting (Norouzi-Fard et al., 2016). In particular, if an upper bound of the optimal solution is given, the streaming algorithm makes a single pass through the data and returns a approximate solution, storing a maximum of elements, and making at most function evaluations per received element. Alternatively, if passes are allowed through the data, the approximation ratio can be improved to a -bicriteria approximation. Iyer and Bilmes (2013) proposed algorithms where weighted cost could be extended to general monotone submodular cost function. It appears that our result could fit into their algorithm’s framework, therefore could be used for more general cost functions.

#### Constrained Submodular Maximization

The constrained submodular maximization problem has been extensively studied with many constraints, both with monotone objectives (Nemhauser and Wolsey, 1978; Badanidiyuru et al., 2014; Mirzasoleiman et al., 2015), and non-monotone (Lee et al., 2009; Gupta et al., 2010; Feige et al., 2011; Buchbinder et al., 2014, 2017). While the seminal greedy algorithm produces an optimal -approximate solution for monotone submodular maximization subject to a cardinality constraint (Nemhauser and Wolsey, 1978), it does not produce any non-trivial guarantee for non-monotone objectives. Submodular maximization and submodular cover are related: It was proven by Iyer and Bilmes that algorithms for submodular maximization can be used as a subroutine for algorithms for submodular cover (and vice versa). In particular, an -bicriteria approximation algorithm for submodular maximization with a knapsack constraint that runs in time can be used to get a -bicriteria approximation algorithm for submodular cover in time . This is done by guessing in order and running the submodular maximization algorithm with budget equal to each guess. For the case of uniform weights, the best currently known approximation guarantee for submodular maximization is 0.385, using the multilinear extension, to the best of our knowledge (Buchbinder and Feldman, 2019). In addition, it has been proven that in the value oracle model it is impossible to get a better approximation guarantee than 0.491 for uniform weights (Gharan and Vondrák, 2011). Therefore this approach to solving SC is limited. Notice that Single-Pass-Knapsack is a bicriteria approximation algorithm, and therefore does not necessarily produce a feasible solution and so can get approximation guarantee above 0.491. Of particular interest is the algorithm of Gupta et al. (2010) for submodular maximization with a knapsack constraint, which is a greedy-like approach that yields a solutions that is -approximate, and a approximate algorithm for uniform cost. The downside to using this algorithm to solve SC as described above is that the resulting solution would not be very close to feasible, and the algorithm is relatively slow for knapsack constraints. An alternative algorithm for submodular maximization with a cardinality constraint are randomized greedy approaches (Buchbinder et al., 2014, 2017). These give approximation guarantees in expectation about , and therefore can be used to get significantly closer to a feasible solution for SC compared to Gupta et al. (2010), but only for uniform cost. Further, they are faster, and Buchbinder et al. (2017) runs in linear time. One question is whether any of the greedy approaches can be extended to keep going and produce a solution that is closer to feasible for SC. This approach works when the objective is monotone. However, the non-monotonicity of the objective prevents this approach from working, and it is not clear that any greedy-like approach that works for submodular maximization can be extended to SC. A number of algorithms have been proposed for constrained submodular maximization in the streaming setting (Chakrabarti and Kale, 2015; Alaluf et al., 2020). The algorithm of Alaluf et al. is especially related to those proposed in this paper because it uses a procedure where elements from the stream are stored in disjoint sets, and then runs an offline algorithm on the result. Alaluf et al. provide a streaming algorithm for cardinality constrained submodular maximization that takes a single pass through the universe using ( is the cardinality constraint) memory. The resulting solution is a -approximate solution where is the approximation guarantee of the cardinality constrained submodular maximization algorithm used as a subroutine. This yields a 0.2779 approximation guarantee if using the state-of-the-art algorithm.

## 2 Algorithms and Theoretical Guarantees

In this Section, several bicriteria approximation algorithms are presented for SC, and their approximation guarantees proven. As mentioned in the introduction, the impossibility results of Feige et al. (2011) have implications for the approximation guarantees of SC. This is because an algorithm for SC can also be used as an algorithm for unconstrained submodular maximization by repeatedly guessing . These implications are stated in the following Theorem.

###### Theorem 1.

For any , there are instances of nonnegative symmetric submodular cover such that there is no (adaptive, possibly randomized) algorithm using fewer than queries that always finds a solution of expected value at least .

###### Proof.

Suppose such an algorithm existed, and let it be called . Then a new algorithm for unconstrained submodular maximization is defined as follows: is run on instance for every such that , and the solution with the highest value of is returned. Notice this results in running times. Because is in the above range, there exists some such that . Once is run on , by assumption it will return such that . This contradicts the result of Feige et al.. ∎

As a result of Theorem 1, is is not possible to develop an -bicriteria approximation algorithm for SC such that . Therefore the algorithms presented in this section approximate the feasibility constraint as well as possible.

### 2.1 Stream

The algorithms for SC presented in this section all depend on a subroutine called Stream. In this section, Stream is described and some theoretical properties of Stream are proven.

#### Algorithm Description

Stream takes as input a parameter and a guess of the optimal solution value, . Stream then makes a single pass through the universe in an arbitrary order, and stores a portion of the elements of total cost at most . The stored elements are broken into disjoint sets, . The elements are chosen to be stored in at most one of the sets or not as they arrive, based upon their marginal gain to with respect to the set, as well as their cost. Once the entire stream has been read, an unconstrained maximization algorithm, UnconstrainedMax, with a approximation ratio, is run on the union of the stored elements. Pseudocode for Stream is presented in Algorithm 1.

#### Theoretical Guarantees of Stream

Two important theoretical results about Stream are stated and proven in this section. Lemma 1 gives guarantees as far as the total weight of all stored elements of Stream. Since the solution returned by Stream is a subset of all stored elements, this also implies theoretical guarantees about the weight of the returned solutions. Lemma 2 gives needed theoretical guarantees about the value of the solution returned by Stream.

###### Lemma 1.

Let be the set returned by Stream. Then , and further throughout the duration of Stream the total weight of all elements stored at once is at most .

###### Proof.

Consider the state of Stream at the beginning of an iteration of the for loop on Line 3, when an element has been read in but not yet added to any of the sets . Then the if statement on Line 6 ensures that for all , . At the end of the iteration, has been added to at most a single set , and the condition that on Line 4 to add ensures that . Further is a subset of . Therefore, at any point in Stream before Line 8,

 w(∪S2/ϵi=0) ≤2/ϵ∑i=0w(Si) ≤(4/ϵ2+1)κ.

Therefore the bound on the total weight at any point in Stream holds. Because the solution returned by Stream is a subset of , the bound on its weight is the same as the bound on its total memory. ∎

###### Lemma 2.

Suppose that Stream is run with input , and . Let be the set returned by Stream. Then .

###### Proof.

The loop on Line 3 of Stream completes in one of two ways: (i) The if statement on Line 6 is satisfied; or (ii) All of the elements of have been read from the stream, and Line 6 was never satisfied. The proof of Lemma 2 is broken up into each of these two events. First suppose event (i) above occurs. Then at the completion of the loop there exists some such that . Let be after the element was added to it and . Then at the completion of Stream

 f(Sr) (a)≥f(Sr)−f(∅) =|Sr|∑ℓ=1(f(Sr(ℓ))−f(Sr(ℓ−1))) (b)≥∑x∈Srw(x)ϵτ/(2κ) =w(Sr)ϵτ/(2κ) (c)≥τ

where (a) is because ; (b) is by the condition on Line 7; and (c) is by the assumption that . Therefore at the completion of Stream Now suppose that event (ii) above occurs. Then at the end of Stream, for all . For this case, we need the following claim.

###### Claim 1.

Let be disjoint, and . Then there exists such that .

###### Proof.

Define . Then is a non-negative submodular function. Consider choosing uniformly randomly from the disjoint sets . Then any element of

has probability at most

of being in . Then

 1mm∑i=1f(B∪Ai) =1mm∑i=1g(Ai) =E[g(A)] (a)≥(1−1m)g(∅) =(1−1m)f(B)

where (a) is from Lemma 4. Therefore there must exist some such that . ∎

Let be an optimal solution to the instance of SC. By Claim 1, there exists such that . Define and . Then,

 (1−ϵ/2)τ ≤f(S∗∪St) =f(X1∪St)+f(S∗∪St)−f(X1∪St) (a)≤f(X1∪St)+∑x∈X2Δf(X1∪St,x) (b)≤f(X1∪St)+∑x∈X2Δf(St,x) (1)

where (a) and (b) are both due to submodularity. In addition,

 ∑x∈X2Δf(St,x) (a)<∑x∈X2w(x)ϵτ/(2κ) =w(X2)ϵτ/(2κ) (b)≤w(X2)ϵτ/(2OPT) (c)≤ϵτ/2 (2)

where (a) is by submodularity and the condition on Line 4; (b) is because ; (c) is because implies that . Then by combining Inequalities 1 and 2, we have that

 (1−ϵ)τ ≤f(X1∪St) (a)≤maxY⊆∪2/ϵi=1Sif(Y) (b)≤1γf(S0)

where (a) is because ; (b) is because is an -approximate maximum of over . Therefore

### 2.2 Multi-Pass-Cover

In this section, the multiple pass streaming algorithm for SC, Multi-Pass-Cover, is presented. Multi-Pass-Cover takes passes through the universe , stores elements of total weight at most , and produces a constant bicriteria approximate solution with constraint approximation near to the optimal .

#### Algorithm Description

Multi-Pass-Cover takes as input a parameter . Multi-Pass-Cover works be sequentially running Stream for increasingly large guesses of . First, the smallest possible guess of is made for . Each iteration, the guess is increased by a multiplicative factor of . During each iteration, the solution returned by Stream is tested as to whether , where is the approximation ratio of UnconstrainedMax. Once the guess is at least , it is guaranteed that (as proven in Theorem 2), and then Multi-Pass-Cover returns and terminates. Pseudocode for Multi-Pass-Cover is given in Algorithm 2.

#### Theoretical Guarantees of Multi-Pass-Cover

The theoretical guarantees of Multi-Pass-Cover are now presented in Theorem 2.

###### Theorem 2.

Suppose that Multi-Pass-Cover is run for an instance of SC. Then:

• [noitemsep]

• The returned set satisfies and ;

• At most passes through are made;

• The total cost of all elements needing to be stored at once is at most ;

###### Proof.

Define to be the unique value where

 (1+ϵ)q−1wmin

By Lemma 2, if the loop on Line 3 reaches , Stream will return a set that satisfies . Then the if statement on Line 5 will be satisfied, and Multi-Pass-Cover will terminate with solution . Further, by Lemma 1, . Therefore item (i) is proven. Each iteration of the loop on Line 3 corresponds to one pass through . Since the loop on Line 3 stops before or once reaches (as explained above), there are at most passes through . Therefore item (ii) is proven. Over the course of Multi-Pass-Cover, increases from to (as explained above). Further, each iteration of the for loop on Line 3 stores elements only needed in the corresponding call of Stream. By Lemma 1, therefore the total weight of all elements stored at once is at most . Therefore item (iii) is proven. ∎

### 2.3 Single-Pass-Cover

While Multi-Pass-Cover took passes through the ground set , an algorithm that makes a single pass through while storing a low total cost is desirable for applications such as where the data is not stored at all. Unfortunately, it is not possible to develop a single pass streaming algorithm for SC that returns an approximately feasible solution, while also maintaining low total stored cost relative to . For example, suppose we have some single pass streaming algorithm for SC that produces a solution with constraint value at least . Consider two instances of SC with uniform weight defined as follows: (i) SC where is modular and for all ; (ii) SC where is modular and for all and . Suppose the algorithm receives the universe in order . Then because the returned solution has constraint value at least , in instance (i) the algorithm must store at least elements before reading element . On the other hand, instances (i) and (ii) are indistinguishable up to element , therefore for instance (ii) the algorithm also stores at least elements. However, in the latter case, and therefore this stored memory is very large compared to . In this section, a single pass algorithm is proposed, Single-Pass-Cover, that instead maintains low memory relative to the optimal solution to an instance of SC where the universe is only those elements read so far. Once the entire universe is read in, SC produces a solution with the same bicriteria approximation ratio as Multi-Pass-Cover.

#### Algorithm Description

Single-Pass-Cover takes as input a parameter , and a parameter . Like Multi-Pass-Cover, Single-Pass-Cover essentially works by running Stream for guesses of . However, Single-Pass-Cover runs Stream in parallel. Instead of guessing sequentially, Single-Pass-Cover maintains a set of guesses of and updates a lower bound for the guesses lazily. On the other hand, an upper bound for is initially given as input (), and then updated by running UnconstrainedMax for each guess after reading in each element. Pseudocode for Single-Pass-Cover is given in Algorithm 3.

#### Theoretical Guarantees of Single-Pass-Cover

The theoretical guarantees of Single-Pass-Cover are presented in Theorem 3.

###### Theorem 3.

Define . Then if :

• [noitemsep]

• The set returned by Single-Pass-Cover satisfies and ;

• Single-Pass-Cover makes a single pass through ;

• The total weight of all elements stored at one time is at most ;

Let be the order that the elements of arrive in, , to be the instance of SC corresponding to ground set , and has a feasible solution. Then if :

• [noitemsep]

• Once the iteration of the loop corresponding to element , , is complete, the total weight of all elements stored at one time at most from that point on.

###### Proof.

Consider an alternate version of Stream where instead of running UnconstrainedMax on after receiving all elements in the stream (Line 8), UnconstrainedMax is run at the end of each iteration of the loop on Line 3 of Stream. I.e., UnconstrainedMax is run after reading in each element of . Notice that this does not change any of the properties of Stream detailed in Lemmas 1 and 2. Then, one can imagine Single-Pass-Cover as running many different instances of Stream in parallel as is read in. In particular, the set are the guesses of , and there is an instance of Stream corresponding to each guess. For each guess , in Single-Pass-Cover correspond to the sets in Stream. Consider the value of at the end of some iteration of the for loop on Line 3. It is now shown that without loss of generality, one can assume that up to this point Single-Pass-Cover is equivalent to running Stream in parallel with guesses of up to this point in the algorithm. is only decreasing throughout Stream, therefore we need to show that small guesses of are wlog running in parallel. Consider any . Consider any previous iteration of the loop on Line 3 such that for the first time an has arrived such that (i.e. the first time an element should be added to for some ), and we are at the beginning of the loop on Line 3. If , then

 f({x})/w(x) ≥Δf(∅,x)/w(x) ≥ϵτ/(2(1+ϵ)i) >m.

Therefore the if statement on Line 4 will be true, will be reset to , and added to the guesses of since

 (1+ϵ)i ≥w(x)ϵτ/(2Δf(∅,x)) ≥ϵτ/(2m).

Item (i) is now proven. By Lemma 2, if there exists a run of Stream with a guess of that is at least as big, then the set returned by Stream has value at least . Therefore by the end of Single-Pass-Cover, any run of Stream corresponding to a guess of that is at least as big as must have triggered the if statement on Line 10. Initially , and only decreases if the if statement on Line 10 is true, it must be that the solution of Single-Pass-Cover has . In addition, the above discussion implies that is no greater than at the end of Single-Pass-Cover, then Lemma 1 implies the remaining part of item (i). Item (iii) is now proven. By Lemma 1, the total weight of all elements stored by each run of Stream with input is , which is bounded above by . In addition, , and therefore there are at most parallel instances of Stream running in Single-Pass-Cover. This proves item (iii). Finally, item (v) is proven. Suppose the iteration of the for loop on Line 3 corresponding to element is complete. By a nearly identical argument to that used for item (i), one can see that the largest guess of is no bigger than at this point. Therefore the largest memory for any run of Stream is by Lemma 1. As shown when proving item (ii), there are at most parallel instances of Stream running in Single-Pass-Cover. Altogether this implies item (v). ∎

### 2.4 Submodular Maximization with a Knapsack Constraint

A related optimization problem to SC is Submodular Maximization with a Knapsack Constraint, defined as follows.

###### Definition 2 (Submodular Knapsack (SK)).

Let be a submodular function defined over subsets of the universe of size such that , and let be a cost function. The Submodular Knapsack problem (SK) is, given , find

 argmaxX⊆U{f(X):∑x∈Xw(x)≤κ}.

Bicriteria approximation algorithms similar in spirit to those presented previously for SC can also be used for SK. These algorithms as well as their theoretical guarantees are presented in the current section.

#### Algorithm Description

Single-Pass-Knapsack is most related to Single-Pass-Cover for SC, in that it runs Stream in parallel for many guesses of . However, it is simpler to guess for this problem. Single-Pass-Knapsack keeps track of a set of guesses of , , and updates them lazily (Line 6 of Algorithm 4). There is no need to run UnconstrainedMax repeatedly after every element is read in, UnconstrainedMax is run for each guess of once all elements have read in. Pseudocode for Single-Pass-Knapsack can be found in Algorithm 4.

#### Theoretical Guarantess of Single-Pass-Knapsack

###### Theorem 4.

Suppose that Single-Pass-Knapsack is run for SK with input . Then:

• [noitemsep]

• The set returned by Single-Pass-Knapsack satisfies and ;

• At most elements of are stored all at once;

###### Proof.

In order to prove Theorem 4, a new version of Lemma 2 is needed. The following Lemma is proved in as essentially identical way to Lemma 2:

###### Lemma 3.

Suppose that Stream is run with input , and . Let be the set returned by Stream. Then .

Similar to Single-Pass-Cover, Single-Pass-Knapsack is essentially running a bunch of instances of Stream in parallel as is read in. In particular, the set are the guesses of , and there is an instance of Stream corresponding to each guess. For each guess , in Single-Pass-Cover correspond to the sets in Stream. Define to be the unique value such that

 (1+ϵ)q≤OPT<(1+ϵ)q+1.

Then we may assume without loss of generality that there is an instance of Stream corresponding to as a guess of for the duration of Single-Pass-Knapsack, as explained as follows. First of all, clearly and therefore is at least the smallest guess throughout the duration of Single-Pass-Knapsack. On the other hand, suppose that for the first time we have received from the stream an element such that (i.e. the first time an element should be added to for some ). If at the beginning of the for loop then

 f({x})/w(x) (a)≥Δf(∅,x)/w(x) ≥ϵ(1+ϵ)q/(2κ) >m

where (a) is because . Therefore the if statement will be true, will be re-assigned as , and added to the guess of since

 (1+ϵ)q ≤2Δf(∅,x)κ/(w(x)ϵ) ≤2f({x})κ/(w(x)ϵ) =2mκ/ϵ

and will remain in the guesses until the end. In light of the above, items (i) and (ii) follow by Lemmas 1 and 3. ∎

## 3 Appendix

###### Lemma 4.

(Lemma 2.2 from Feige et al. (2011)) Let be a non-negative submodular function. Denote by a random subset of where each element appears with probability at most (not necessarily independently). Then .

## References

• Alaluf et al. [2020] Naor Alaluf, Alina Ene, Moran Feldman, Huy L Nguyen, and Andrew Suh. Optimal streaming algorithms for submodular maximization with cardinality constraints. In 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
• Badanidiyuru et al. [2014] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming Submodular Maximization: Massive Data Summarization on the Fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and Data Mining (KDD), pages 671–680, 2014.
• Barezi et al. [2019] Elham J Barezi, Ian D Wood, Pascale Fung, and Hamid R Rabiee. A submodular feature-aware framework for label subset selection in extreme classification problems. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1009–1018, 2019.
• Buchbinder and Feldman [2018] Niv Buchbinder and Moran Feldman. Deterministic algorithms for submodular maximization problems. ACM Transactions on Algorithms (TALG), 14(3):1–20, 2018.
• Buchbinder and Feldman [2019] Niv Buchbinder and Moran Feldman. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 44(3):988–1005, 2019.
• Buchbinder et al. [2014] Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pages 1433–1452. SIAM, 2014.
• Buchbinder et al. [2015] Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM Journal on Computing, 44(5):1384–1402, 2015.
• Buchbinder et al. [2017] Niv Buchbinder, Moran Feldman, and Roy Schwartz. Comparing apples and oranges: Query trade-off in submodular maximization. Mathematics of Operations Research, 42(2):308–329, 2017.
• Chakrabarti and Kale [2015] Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: Matchings, matroids, and more. Mathematical Programming, 154(1):225–247, 2015.
• Chen et al. [2019] Lin Chen, Moran Feldman, and Amin Karbasi. Unconstrained submodular maximization with constant adaptive complexity. In

Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing

, pages 102–113, 2019.
• Crawford et al. [2019] Victoria Crawford, Alan Kuhnle, and My Thai. Submodular cost submodular cover with an approximate oracle. In International Conference on Machine Learning, pages 1426–1435. PMLR, 2019.
• Das and Kempe [2011] Abhimanyu Das and David Kempe. Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection. Proceedings of the 28th International Conference on Machine Learning (ICML), 2011.
• Dobzinski and Mor [2015] Shahar Dobzinski and Ami Mor. A deterministic algorithm for maximizing submodular functions. arXiv, pages arXiv–1507, 2015.
• Ene et al. [2018] Alina Ene, Huy L Nguyen, and Adrian Vladu. A parallel double greedy algorithm for submodular maximization. arXiv preprint arXiv:1812.01591, 2018.
• Feige et al. [2011] Uriel Feige, Vahab S Mirrokni, and Jan Vondrák. Maximizing non-monotone submodular functions. SIAM Journal on Computing, 40(4):1133–1153, 2011.
• Gharan and Vondrák [2011] Shayan Oveis Gharan and Jan Vondrák. Submodular maximization by simulated annealing. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 1098–1116. SIAM, 2011.
• Gupta et al. [2010] Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotone submodular maximization: Offline and secretary algorithms. In International Workshop on Internet and Network Economics, pages 246–257. Springer, 2010.
• Iyer and Bilmes [2013] Rishabh K Iyer and Jeff A Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems, pages 2436–2444, 2013.
• Kempe et al. [2003] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137–146. ACM, 2003.
• Lee et al. [2009] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 323–332. ACM, 2009.
• Leskovec et al. [2007] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 420–429, 2007.
• Mirzasoleiman et al. [2015] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrák, and Andreas Krause. Lazier than lazy greedy. In

Proceedings of the AAAI Conference on Artificial Intelligence

, volume 29, 2015.
• Nemhauser and Wolsey [1978] G L Nemhauser and L A Wolsey. Best Algorithms for Approximating the Maximum of a Submodular Set Function. Mathematics of Operations Research, 3(3):177–188, 1978.
• Norouzi-Fard et al. [2016] Ashkan Norouzi-Fard, Abbas Bazzi, Marwa El Halabi, Ilija Bogunovic, Ya-Ping Hsieh, and Volkan Cevher. An efficient streaming algorithm for the submodular cover problem. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4500–4508, 2016.
• Tschiatschek et al. [2014] Sebastian Tschiatschek, Rishabh K Iyer, Haochen Wei, and Jeff A Bilmes. Learning mixtures of submodular functions for image collection summarization. In Advances in neural information processing systems, pages 1413–1421, 2014.
• Wan et al. [2010] Peng Jun Wan, Ding Zhu Du, Panos Pardalos, and Weili Wu. Greedy approximations for minimum submodular cover with submodular cost. Computational Optimization and Applications, 45(2):463–474, 2010.
• Wolsey [1982a] Laurence A Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.
• Wolsey [1982b] Laurence A Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.
• Xu et al. [2015] Jia Xu, Lopamudra Mukherjee, Yin Li, Jamieson Warner, James M Rehg, and Vikas Singh. Gaze-enabled egocentric video summarization via constrained submodular maximization. In

Proceedings of the IEEE conference on computer vision and pattern recognition

, pages 2235–2244, 2015.