DeepAI

# Cardinality constrained submodular maximization for random streams

We consider the problem of maximizing submodular functions in single-pass streaming and secretaries-with-shortlists models, both with random arrival order. For cardinality constrained monotone functions, Agrawal, Shadravan, and Stein gave a single-pass (1-1/e-ε)-approximation algorithm using only linear memory, but their exponential dependence on ε makes it impractical even for ε=0.1. We simplify both the algorithm and the analysis, obtaining an exponential improvement in the ε-dependence (in particular, O(k/ε) memory). Extending these techniques, we also give a simple (1/e-ε)-approximation for non-monotone functions in O(k/ε) memory. For the monotone case, we also give a corresponding unconditional hardness barrier of 1-1/e+ε for single-pass algorithms in randomly ordered streams, even assuming unlimited computation. Finally, we show that the algorithms are simple to implement and work well on real world datasets.

• 13 publications
• 31 publications
• 14 publications
• 7 publications
02/17/2018

### Multi-Pass Streaming Algorithms for Monotone Submodular Function Maximization

We consider maximizing a monotone submodular function under a cardinalit...
04/14/2021

### Streaming Algorithms for Cardinality-Constrained Maximization of Non-Monotone Submodular Functions in Linear Time

For the problem of maximizing a nonnegative, (not necessarily monotone) ...
04/27/2020

### Robust Algorithms under Adversarial Injections

In this paper, we study streaming and online algorithms in the context o...
09/10/2020

### Quick Streaming Algorithms for Maximization of Monotone Submodular Functions in Linear Time

We consider the problem of monotone, submodular maximization over a grou...
07/14/2022

### Streaming complexity of CSPs with randomly ordered constraints

We initiate a study of the streaming complexity of constraint satisfacti...
07/30/2018

### Non-monotone Submodular Maximization in Exponentially Fewer Iterations

In this paper we consider parallelization for applications whose objecti...
05/02/2019

### Submodular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity

Streaming algorithms are generally judged by the quality of their soluti...

## 1 Introduction

Over the past few decades, submodularity has become recognized as a useful property occurring in a wide variety of discrete optimization problems. Submodular functions model the property of diminishing returns, whereby the gain in a utility function decreases as the set of items considered increases. This property occurs naturally in machine learning, information retrieval, and influence maximization, to name a few (see

Iyer et al. (2020) and the references within).

In many settings, the data is not available in a random-access model; either for external reasons (customers arriving online) or because of the massive amount of data. When the data becomes too big to store in memory, we look for streaming algorithms that pass (once) through the data, and efficiently decide for each element whether to discard it or keep it in a small buffer in memory.

In this work, we consider algorithms that process elements arriving in a random order. Note that the classical greedy algorithm which iteratively adds the best element Nemhauser et al. (1978) cannot be used here, and hence we must look for new algorithmic techniques. The motivation for considering random element arrival comes from the prevalence of submodularity in big-data applications, in which data is often logged in batches that can be modelled as random samples from an underlying distribution.111Note that the random order assumption generalizes the typical assumption of i.i.d. sampling stream elements from a known distribution, as the random order assumption applies equally well to unknown distributions. The problem of streaming submodular maximization has recently received significant attention both for random arrival order and the more pessimistic worst-case order arrival Agrawal et al. (2019); Alaluf et al. (2020); Alaluf and Feldman (2019); Badanidiyuru et al. (2014); Badanidiyuru and Vondrák (2014); Chekuri et al. (2015); Feldman et al. (2018, 2020); Huang et al. (2020); Indyk and Vakilian (2019); Kazemi et al. (2019); McGregor and Vu (2019); Norouzi-Fard et al. (2018); Huang et al. (2020); Shadravan (2020).

### 1.1 Submodular functions and the streaming model

Let be a non-negative set function satisfying for all . Such a function is called submodular. For simplicity, we assume .222Submodular functions with only improves the approximation ratio of our algorithms. We use the shorthand to denote the marginal of on top of . When for all , is called monotone. We consider the following optimization problem:

 OPT:=max|S|≤kf(S)

where is a cardinality constraint for the solution size.

Our focus on submodular maximization in the streaming setting. In this setting, an algorithm is given a single pass over a dataset in a streaming fashion, where the stream is a some permutation of the input dataset and each element is seen once. The stream is in random order when the permutation is uniformly random. When there are no constraints on the stream order, we call the stream adversarial.

At each step of the stream, our algorithm is allowed to maintain a buffer of input elements. When a new element is streamed in, the algorithm can choose to add the element to its buffer. To be as general as possible, we assume an oracle model — that is, we assume there is an oracle that returns the value of for any set . The decision of the algorithm to add an element is based only on queries to the oracle on subsets of the buffer elements. The algorithm may also choose to throw away buffered elements at any given time. The goal in the streaming model is to minimize memory use, and the complexity of the algorithm is the maximum number of input elements the algorithm stores in the buffer at any given time.

For the oracle model, an important distinction is between weak oracle access or and strong oracle access. In the weak oracle setting, the algorithm is only allowed to query sets of feasible elements (sets that have cardinality less than ). In the strong oracle setting however, the algorithm is allowed to query any set of elements. All our results apply to both the weak and strong oracle models.

Our aim will be to develop algorithms that only make one pass over the data stream, using memory, where is the maximum size of a solution set in . We assume that is small relative to , the size of the ground set.

### 1.2 Our contributions

On the algorithmic side, we give and approximation for cardinality constrained monotone and non-monotone submodular maximization respectively, both using memory. The monotone result has an exponential improvement in memory requirements compared to Agrawal et al. Agrawal et al. (2019) (in terms of the dependence on ), while the non-monotone result is the first to appear in the random-order streaming model, and improves upon the best known polynomial-time approximations under adversarial orders Alaluf et al. (2020). The algorithms are extremely simple to implement, and perform well on real world data (see Section 5), even compared to offline greedy algorithms.

On the hardness side, we prove that a -approximation for monotone submodular maximization would require memory (even with unlimited queries and computational power). This improves the hardness bound of from Agrawal et al. (2019).

### 1.3 Related work

Prior work on this problem has focused on both the adversarial and random-order streaming setting. Algorithmic and hardness results further depend on whether the function is monotone or non-monotone, and whether has explicit structure (e.g. such as by presenting a set system for in the coverage case), or accessible only via oracle queries. Table 1 describes all the relevant results.

#### Algorithmic results.

Submodular maximization in the streaming setting was first considered by Badanidiyuru et al. Badanidiyuru et al. (2014) who gave a -approximation in memory for monotone submodular functions under a cardinality constraint, using a thresholding idea with parallel enumeration of possible thresholds. This work led to a number of subsequent developments with the current best being a -approximation in memory Kazemi et al. (2019). It turns out that the factor of is the best possible in the adversarial setting (with a weak oracle), but an improvement is possible in the random order model (the input is ordered by a uniformly random permutation). This was first shown by Norouzi-Fard et al. Norouzi-Fard et al. (2018), who proved that the -hardness barrier for cardinality constraints can be broken, exhibiting a -approximation in memory where . In a breakthrough work, Agrawal, Shadravan, and Stein Agrawal et al. (2019) gave a -approximation using memory and running time. We note that this is arbitrarily close to the optimal factor of , but the algorithm is not practical, due to its dependence on (even for , the resulting constants are astronomical).

#### Lower bounds.

A few lower bounds are known for monotone functions in the adversarial order model: with a weak oracle, any -approximation would require memory Norouzi-Fard et al. (2018). Under a strong oracle, a lower bound of memory for any -approximation algorithm was shown in a recent paper by Feldman et al. Feldman et al. (2020). Another recent lower bound was proved by McGregor and Vu McGregor and Vu (2019): a -approximation for coverage functions requires memory (this lower bound holds for explicitly given inputs, via communication complexity; we note that this is incomparable to the computational -hardness of maximum coverage Feige (1998)). For non-monotone functions, Alaluf et al. Alaluf et al. (2020) proved an memory lower bound for the adversarial order model with unbounded computation.

In the random-order model, Agrawal et al.  Agrawal et al. (2019) show that beating (for monotone submodular functions) requires memory. In contrast, we show that same construction as McGregor and Vu McGregor and Vu (2019) also applies to randomly ordered streams: for coverage functions requires memory even in the random-order model.

#### Submodular maximization in related models.

A closely related model is the secretary with shortlists model Agrawal et al. (2019), where an algorithm is allowed to store a shortlist of more than items (where is the cardinality constraint). Unlike the streaming model however, once an element goes into the shortlist, it cannot be removed. Then, after seeing the entire stream, the algorithm chooses a subset of size from the shortlist and returns that to the user. We note that the algorithms developed in this paper apply almost without modification to the shortlists model.

### 1.4 Overview of our techniques

#### Main algorithmic techniques.

The primary impetus for our algorithmic work was an effort to avoid the extensive enumeration involved in the algorithm of Agrawal et al. Agrawal et al. (2019) which leads to memory requirements exponential in .

To make things concrete, let us consider the input divided into disjoint windows of consecutive elements. The windows containing actual optimal elements play a special role — let’s call them active windows — these are the windows where we make quantifiable progress. When the stream is randomly ordered, we would ideally like to have each new element sampled independently and uniformly from the input. This leads to the intuition that the optimal elements are evenly spread out through all the windows. This cannot be literally true, since conditioned on the history of the stream, some elements have already appeared and cannot appear again. However, a key idea of Agrawal et al. Agrawal et al. (2019) allows us to circumvent this by reinserting the elements that we have already seen and that played a role in the selection process. What needs to be proved is that elements that were not selected can still appear in the future, conditioned on the history of the algorithm; that turns out to be true, provided that our algorithm operates in a certain greedy-like manner.

To ensure progress was made regardless of the positioning of the optimal elements, previous work made use of exponentially large enumerations to essentially guess which windows the optimal elements arrive in. Where we depart from previous work is the way we build our solution. The idea is to use an evolving family of solutions which are updated in parallel, so that we obtain a quantifiable gain regardless of where the optimal elements arrived. Specifically, we grow solutions in parallel, where solution has cardinality . In each window, we attempt to extend a collection of solutions (for varying ) by a new element ; if is beneficial on average to every in the collection, we replace each with the new solution . Regardless of which windows happen to be active, we will show that the average gain over our evolving collection of solutions is analogous to the greedy algorithm. This is the basis of the analysis that leads to a factor of .

In addition to our candidate solutions , we maintain a pool of elements that our algorithm has ever included in some candidate solution. We then use

to reintroduce elements artificially back into the input; this makes it possible to assume that every input element still appears in a future window with the same probability, which is key to the probabilistic analysis leading to

.

#### Non-monotone functions.

Our algorithm for non-monotone submodular functions is similar, with the caveat that here we also have to be careful about not including any element in the solution with large probability. This is an important aspect of the randomized greedy algorithm for (offline) non-monotone submodular maximization Buchbinder et al. (2014) which randomly includes in each step one of the top elements in terms of marginal values. We achieve a similar property by choosing the top element from the current window and a random subset of the pool of prior elements .

#### Hardness results.

Our hardness instances have the following general structure: there is a special subset of good elements, and the remaining elements are bad. The good elements are indistinguishable from each other, and ditto for the bad elements. In the monotone case, any bad elements are a -factor worse than the optimal solution ( good elements). Suppose furthermore that for parameter , as long as we never query the function on a subset with good elements, the good elements are indistinguishable from bad elements. The only way to collect good elements in the memory buffer is by chance – until we’ve collected the required number of good elements, they are indistinguishable from bad elements, so the subset in the memory buffer is random. The classic work of Nemhauser and Wolsey Nemhauser and Wolsey (1978) constructs a pathological monotone submodular function with , which we use to prove that without memory the algorithm cannot beat . McGregor and Vu McGregor and Vu (2019) construct a simple example of a coverage function with , which we use for our bound. For an exponential-size ground set, we extend their construction to which translates to the improved lower bound of .

## 2 A (1−1/e−ε)-approximation in O(k/ε) memory

In this section, we develop a simple algorithm for optimizing a monotone submodular function with respect to a cardinality constraint. For the sake of exposition, we focus on the intuition behind the results and relegate full proofs to the appendix.

Our algorithm begins by randomly partitioning the stream into contiguous windows of expected size , where is a parameter controlling the memory dependence and approximation ratio. This is done by generating a random partition according to Algorithm 1. As the algorithm progresses, it maintains partial solutions, the -th of which contains exactly elements. Within each window we process all the elements independently, and choose one candidate element to extend the partial solutions by. We then add to a collection of partial solutions at the end of the window. The range of partial solution sizes that we use roughly tracks the number of optimal elements we are expected to have seen so far in the stream.

Intuitively, our algorithm is guaranteed to make progress on windows that contain an element from the optimal solution . Let us loosely call such windows active (a precise definition will be given later). Of course, the algorithm never knows which windows are active. However, the key idea of our analysis is that we are able to track the progress that our algorithm makes on active windows. Since the input stream is uniformly random, intuitively we expect to see optimal elements after processing windows. With high probability, the true number of optimal elements seen will be in the range . By focusing on the average improvement over levels in , we can show that each level in this range gains in expectation, whenever a (random) optimal element arrives.

For the analysis to work, ideally we would like each arriving optimal element to be selected uniformly among all optimal elements. This is not true conditioned on the history of decisions made by the algorithm. However, we can remedy this by re-inserting elements that we have selected before and subsampling the elements in the current window, with certain probabilities. A key lemma (Lemma 2.2) shows why this works, since the elements we have never included might still appear, given the history of the algorithm. Our basic algorithm is described in Algorithm 2, with the window partitioning procedure described in Algorithm 1.

In its most basic implementation, Algorithm 2 requires memory (to store and ’s for ). However, there are several optimizations we can make. Algorithm 2 can be implemented in a way that the ’s are not directly stored at all. To avoid storing the ’s, we can augment to contain not just , but also the index of the window it was added in. The index of the window tells us the range of levels that was inserted into, so all of the ’s can be reconstructed from as contains a history of all the insertions. Thus the memory use of Algorithm 2 is the size of at the end of the stream. Since there are windows and each window introduces at most element to , we have the following observation:

###### Observation 2.1.

Algorithm 2 uses at most space and time.

When is streamed in random order, our partitioning procedure (Algorithm 1) has a much simpler interpretation. (A similar lemma can be found in the appendix of Agrawal et al.  Agrawal et al. (2019).)

###### Lemma 2.1.

Suppose is streamed according to a permutation chosen at random and we partition by Algorithm 1 into windows. This is equivalent to assigning each to one of different buckets uniformly and independently at random.

The algorithm’s performance and behavior depends on the ordering of . Let us define the history of the algorithm up to the -th window, denoted , to be the sequence of all solutions produced up to that point. (Note that this history is only used in the analysis.) More precisely, we define the history as follows.

###### Definition 2.1.

Let denote the state of the set maintained by the algorithm, before processing window . We define to be the set of all triples such that element was added to solution in window . In other words, contains all of the changes that the algorithm made to its state while processing the first windows. For convenience, sometimes we treat as a set of elements and say that if .

The history describes the entire memory state of the algorithm up to the end of window . In the following, we analyze the performance of the algorithm in the -th window conditioned the history . Note that different random permutations of the input may produce this history, and we average over all of them in the analysis.

The next key lemma captures the intuition that elements not selected by the algorithm so far could still appear in the future, and bounds the probability with which this happens.

###### Lemma 2.2.

Fix a history . For any element , and any , we have

Next, we define a set of active windows. In each active window, the algorithm is expected to make significant improvements to its candidate solution. The active windows will only be used in the analysis of the algorithm, and need not be computed in any way.

###### Definition 2.2.

Let be the optimal solution. For window , let be the probability that given . Define its active set to be the union of and the set obtained by sampling each with probability . We call an active window if and we call the active optimal elements of window .

Note that the construction of active sets in Definition 2.2 is valid as Lemma 2.2 guarantees . More importantly, the active window subsamples the optimal elements so that each element appears in with probability exactly regardless of the history . This allows us to tightly bound the number of active windows in the input, as we show in the next lemma.

###### Lemma 2.3.

Suppose we have streamed up to the -th window of the input for some . Then expected number of active windows seen so far satisfies

 ¯Zαβ:=expected number of active windows=β−Θ(β/α).

Furthermore, the actual number of windows concentrates around to within with probability .

Next we analyze the expected gain in the solution after processing each active window. Let the event that window is active.

###### Lemma 2.4.

Let where and are the values of and defined in Algorithm 2 on window . Conditioned on a history and window being active,

 ∑ℓ∈¯Li+1E[f(Li+1ℓ+1)−f(Liℓ)∣Hi,Ai+1]≥1k∑ℓ∈¯Li+1(f(O)−E[f(Liℓ)∣Hi]). (1)

Under an ideal partition, each element of appears in a different window, with one optimal element appearing roughly once every windows. Thus after active windows, we expect to obtain a -approximation (as in the standard greedy analysis).

###### Theorem 2.5.

The expected value of the best solution found by the algorithm is at least

 ⎛⎝1−1e−O⎛⎝1α+α√logkk⎞⎠⎞⎠OPT.

Setting , we have a -approximation using memory.

## 3 A (1/e−ε)-approximation for non-monotone submodular maximization

In this section, we show that the basic algorithm described in Algorithm 2 can be altered to give a -approximation to the cardinality constrained non-monotone case (Algorithm 3).

Algorithm 3 uses the same kind of multi-level scheme as Algorithm 2. However, Algorithm 3 further sub-samples the elements of the input so that the probability of including any element is exactly lines 7–13 (coloured in orange). The sub-sampling allows us to bound the maximum probability that an element of the input is included in the solution. In particular, the sub-sampling is done by having the algorithm compute (on the fly) the conditional probability that an element could have been selected had it appeared in the past. This gives us the ability to compute an appropriate sub-sampling probability to ensure that does not appear in with too high a probability. In terms of the proof, the sub-sampling allows us to perform a similar analysis to the RandomGreedy algorithm of Buchbinder et al. Buchbinder et al. (2014).333A difference here is that instead of analysing a random element of the top- marginals, we analyse the optimal set directly.

Since many of the main ideas are the same, we relegate the details of the analysis to the appendix.

#### Implementation of Algorithm 3

For clarity of exposition, we compute up front in line 4. However, we can compute them on the fly in practice since each element only uses its value of once (lines 10 and 12). This avoids an memory cost associated with storing each . Finally, we assume that there are no ties when computing the best candidiate element in each window. Ties can be handled by any arbitrary but consistent tie-breaking procedure. Any additional information used to break the ties (for example an ordering on the elements ) must be stored alongside for the computation of (line 10).

###### Theorem 3.1.

Algorithm 3 obtains a -approximation for maximizing a non-monotone function with respect to a cardinality constraint in memory.

We remark that Algorithm 3 also achieves a guarantee of for the monotone case, as Lemma 2.4 and Theorem 2.5 both still apply to Algorithm 3 when is monotone. The main difference between the two is the sub-sampling (lines 7–13), which increases the running time of the algorithm.

## 4 1−1/e hardness for monotone submodular maximization

The proofs of the following propositions and lemmas may be found in the appendix.

###### Proposition 4.1.

Fix subsets of elements (denoting “good” and “bad”) such that and ; let be some parameter. Let denote the size of the memory buffer, and let denote the probability that a random subset of size contains at least good elements. Let be a function that satisfies the following symmetries:

• is symmetric over good (resp. bad) elements, namely there exists such that

• For any set with good elements, does not distinguish between good and bad elements, namely for ,

Then any algorithm has expected value at most

 ALG≤(1−pk)^f(0,k)+pk⋅OPT. (2)

We now consider a few different ’s that satisfy the desiderata of Proposition 4.1.

###### Lemma 4.1 (monotone submodular function Nemhauser and Wolsey (1978)).

There exists a monotone submodular that satisfies the desiderata of Proposition 4.1 for , and such that:

• .

• .

###### Lemma 4.2 (polynomial-universe coverage function McGregor and Vu (2019)).

There exists a (monotone submodular) coverage function over a polynomial universe that satisfies the desiderata of Proposition 4.1 for , and such that:

• .

• .

###### Lemma 4.3 (exponential-universe coverage function (new construction)).

There exists a (monotone submodular) coverage function over an exponential universe that satisfies the desiderata of Proposition 4.1 for , and such that:

• .

• .

Our main hardness result follows from the lemmas above:

###### Theorem 4.4.

Any -approximation algorithm in the random order strong oracle model must use the following memory:

• for a general monotone submodular function.

• for a coverage function over a polynomial universe.

• for a coverage function over an exponential universe.

## 5 Experimental results

In the following section, we give experimental results for our monotone streaming algorithm. Due to space limitations, the experiments for the non-monotone algorithm can be found in the appendix. Our main goal is to show that our algorithm performs well in a practical setting and is simple to implement. In fact, we show that our algorithm is on par with offline algorithms in performance, and returns competitive solutions across a variety of datasets. All experiments were performed on a 2.7 GHz dual-core Intel i7 CPU with 16 GB of RAM.

We compare the approximation ratios obtained by our algorithm with three benchmarks:

• The offline LazyGreedy algorithm Minoux (1978), which is both theoretically optimal and obtains the same solution as greedy (in faster time). Note that we don’t expect to outperform it with a streaming algorithm; but as we hoped, our algorithm comes close.

• The SieveStreaming algorithm of Badanidiyuru et al. Badanidiyuru et al. (2014), which is the first algorithm to appear for adversarial streaming submodular optimization.

• The Salsa algorithm of Norouzi-Fard et al. Norouzi-Fard et al. (2018), which is the first “beyond ” approximation algorithm for random-order streams. This algorithm runs several varients of SieveStreaming in parallel with thresholds that change as the algorithm progresses through the stream. 444SieveStreaming is also known as threshold greedy in the literature Badanidiyuru and Vondrák (2014). Note that the later SieveStreaming++ algorithm of Kazemi et al. (2019) is more efficient, but for approximation ratio SieveStreaming is a stronger benchmark. for adversarial order streaming. As we would expect, our algorithm performs better on random arrival streams.

Note that in terms of memory use, our algorithm is strictly more efficient. The analysis in previous sections show that the memory is (with a small constant), versus for both SieveStreaming and Salsa. Thus in the experiments below, we focus on the approximation ratio obtained by our algorithm.

#### Datasets

Our datasets are drawn from set coverage instances from the 2003 and 2004 workshops on Frequent Itemset Mining Implementations on Data Mining (2003) and the Steiner triple instances of Beasley Beasley (1987). For each data set we run the three algorithms for cardinality constraints varying from to . The results of the algorithms are averaged across 10 random stream orderings. Table 2 describes the data sources. Figure 1 shows the performance of the three algorithms on each data set. All code can be found at https://github.com/where-is-paul/submodular-streaming and all datasets can be found at https://tinyurl.com/neurips-21.

## 6 Conclusion and Future Work

In this work, we have presented memory-optimal algorithms for the problem maximizing submodular functions with respect to cardinality constraints in the random order streaming model. Our algorithms achieve an optimal approximation factor of for the monotone submodular case, and an approximation factor of for the non-monotone case. In addition to theoretical guarantees, we show that the algorithm outperforms existing state-of-the-art on a variety of datasets.

We close with a few open questions that would make for interesting future work. Although our algorithm is memory-optimal, it is not runtime-optimal. In particular, the SieveStreaming Badanidiyuru et al. (2014) and Salsa Norouzi-Fard et al. (2018) algorithms both run in time , whereas our algorithm runs in time . The non-monotone variant of our algorithm runs even slower, as it needs to perform sub-sampling operations that take at least per stream element in its current form. Improving this runtime would greatly improve the practicality of our algorithm for extremely large cardinality constraints. Finally, there has been recent interest in examining streaming algorithms for streams under “adversarial injection” Garg et al. (2020). In such streams, the optimal elements of the stream are randomly ordered, while adversarial elements can be injected between the optimal elements with no constraints. Despite the seemingly large power of the adversary, the approximation barrier can still be broken in this model. It would be interesting to see if the work in this paper can be extended to such a setting.

The authors are indebted to Mohammad Shadravan and Morteza Monemizadeh for their insightful discussions that no doubt improved this work. The first author is supported by a VMWare Fellowship and the Natural Sciences and Engineering Research Council of Canada. The second and fourth authors are supported by NSF CCF-1954927, and the second author is additionally supported by a David and Lucile Packard Fellowship.

## References

• [1] S. Agrawal, M. Shadravan, and C. Stein (2019) Submodular Secretary Problem with Shortlists. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, LIPIcs, Vol. 124, pp. 1:1–1:19. External Links: Cited by: Remark A.1, Cardinality constrained submodular maximization for random streams, §1.2, §1.2, §1.3, §1.3, §1.3, §1.4, §1.4, Table 1, §1, §2.
• [2] N. Alaluf, A. Ene, M. Feldman, H. L. Nguyen, and A. Suh (2020-02) Optimal Streaming Algorithms for Submodular Maximization with Cardinality Constraints. arXiv:1911.12959 [cs]. Note: arXiv: 1911.12959 External Links: Link Cited by: §1.2, §1.3, Table 1, §1.
• [3] N. Alaluf and M. Feldman (2019-06) Making a Sieve Random: Improved Semi-Streaming Algorithm for Submodular Maximization under a Cardinality Constraint. arXiv:1906.11237 [cs]. Note: arXiv: 1906.11237 External Links: Link Cited by: §1.
• [4] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause (2014) Streaming Submodular Maximization: Massive Data Summarization on the Fly. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014, S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani (Eds.), pp. 671–680. External Links: Cited by: §1.3, §1, 2nd item, §6.
• [5] A. Badanidiyuru and J. Vondrák (2014) Fast Algorithms for Maximizing Submodular Functions. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, C. Chekuri (Ed.), pp. 1497–1514. External Links: Cited by: §1, footnote 4.
• [6] J.E. Beasley (1987) An algorithm for set covering problem. European Journal of Operational Research 31 (1), pp. 85 – 93. External Links: ISSN 0377-2217, Document, Link Cited by: §5.
• [7] N. Buchbinder, M. Feldman, J. Naor, and R. Schwartz (2014) Submodular Maximization with Cardinality Constraints. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014, C. Chekuri (Ed.), pp. 1433–1452. External Links: Cited by: Appendix C, Appendix C, Lemma C.3, Appendix C, Appendix D, §1.4, §3.
• [8] C. Chekuri, S. Gupta, and K. Quanrud (2015) Streaming algorithms for submodular function maximization. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part I, M. M. Halldórsson, K. Iwama, N. Kobayashi, and B. Speckmann (Eds.), Lecture Notes in Computer Science, Vol. 9134, pp. 318–330. External Links: Cited by: §1.
• [9] B. Doerr (2020)

Probabilistic Tools for the Analysis of Randomized Optimization Heuristics

.
arXiv:1801.06733 [cs, math], pp. 1–87. Note: arXiv: 1801.06733Comment: 91 pages External Links: Cited by: Appendix A, Appendix A.
• [10] U. Feige (1998) A Threshold of ln n for Approximating Set Cover. J. ACM 45 (4), pp. 634–652. External Links: Cited by: §1.3.
• [11] M. Feldman, A. Karbasi, and E. Kazemi (2018) Do less, get more: streaming submodular maximization with subsampling. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 730–740. External Links: Link Cited by: Table 1, §1.
• [12] M. Feldman, A. Norouzi-Fard, O. Svensson, and R. Zenklusen (2020) The One-way Communication Complexity of Submodular Maximization with Applications to Streaming and Robustness. In

Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 22-26, 2020

, K. Makarychev, Y. Makarychev, M. Tulsiani, G. Kamath, and J. Chuzhoy (Eds.),
pp. 1363–1374. External Links: Cited by: §1.3, Table 1, §1.
• [13] P. Garg, S. Kale, L. Rohwedder, and O. Svensson (2020) Robust algorithms under adversarial injections. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), A. Czumaj, A. Dawar, and E. Merelli (Eds.), LIPIcs, Vol. 168, pp. 56:1–56:15. External Links: Cited by: §6.
• [14] C. Huang, N. Kakimura, S. Mauras, and Y. Yoshida (2020-02) Approximability of Monotone Submodular Function Maximization under Cardinality and Matroid Constraints in the Streaming Model. arXiv:2002.05477 [cs] (en). Note: arXiv: 2002.05477 External Links: Link Cited by: Table 1, §1.
• [15] C. Huang, T. Thiery, and J. Ward (2020) Improved multi-pass streaming algorithms for submodular maximization with matroid constraints. In

Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 17-19, 2020, Virtual Conference

, J. Byrka and R. Meka (Eds.),
LIPIcs, Vol. 176, pp. 62:1–62:19. External Links: Cited by: §1.
• [16] P. Indyk and A. Vakilian (2019) Tight Trade-offs for the Maximum k-Coverage Problem in the General Streaming Model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, D. Suciu, S. Skritek, and C. Koch (Eds.), pp. 200–217. External Links: Cited by: §1.
• [17] R. K. Iyer, N. Khargoankar, J. A. Bilmes, and H. Asanani (2020) Submodular Combinatorial Information Measures with Applications in machine learning. CoRR abs/2006.15412. External Links: Link, 2006.15412 Cited by: §1.
• [18] E. Kazemi, M. Mitrovic, M. Zadimoghaddam, S. Lattanzi, and A. Karbasi (2019) Submodular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, pp. 3311–3320. Cited by: §1.3, Table 1, §1, footnote 4.
• [19] P. Liu, A. Soni, E. Y. Kang, Y. Wang, and M. Parsana (2021) Diversity on the go! streaming determinantal point processes under a maximum induced cardinality objective. In The Web Conference 2021, TheWebConf 2021, Cited by: Appendix D.
• [20] A. McGregor and H. T. Vu (2019) Better Streaming Algorithms for the Maximum Coverage Problem. Theory Comput. Syst. 63 (7), pp. 1595–1619. External Links: Cited by: §1.3, §1.3, §1.4, §1, Lemma 4.2.
• [21] M. Minoux (1978) Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pp. 234–243. Cited by: 1st item.
• [22] G. L. Nemhauser and L. A. Wolsey (1978) Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research 3 (3), pp. 177–188. Cited by: §1.4, Lemma 4.1.
• [23] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher (1978) An Analysis of Approximations for Maximizing Submodular Set Functions - I. Math. Program. 14 (1), pp. 265–294. External Links: Cited by: §1.
• [24] A. Norouzi-Fard, J. Tarnawski, S. Mitrovic, A. Zandieh, A. Mousavifar, and O. Svensson (2018) Beyond 1/2-approximation for submodular maximization on massive data streams. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, J. G. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 3826–3835. External Links: Link Cited by: §1.3, §1.3, Table 1, §1, 3rd item, §6.
• [25] I. I. C. on Data Mining (2003 and 2004) Workshops on Frequent Itemset Mining Implementations. Cited by: §5.
• [26] M. Shadravan (2020) Improved submodular secretary problem with shortlists. CoRR abs/2010.01901. External Links: Link, 2010.01901 Cited by: §1.

## Appendix A Missing proofs from Section 2

We begin with a simple lemma showing that the values of the levels are monotone:

###### Lemma A.1.

For all and , and .

###### Proof.

First, we note that the second part of the lemma holds by lines 1516. Let and be the value of and in Algorithm 2 on line 9 on window . Consider a window . There are two cases, depending on whether an element was added to the solutions or not. Suppose no element was added to the solution. Then all the levels remain the same. Line 15 guarantees that . Since no elements were added, so for every level . Now suppose an element was added in window . For levels and , , so . For levels , at the end of line 13, and its value only improves through lines 1516. Thus . ∎

The rest of the proofs below correspond directly to unproven lemmas in Section 2.

###### Lemma 2.1.

Suppose is streamed according to a permutation chosen at random and we partition by Algorithm 1 into windows. This is equivalent to assigning each to one of different buckets uniformly and independently at random.

###### Proof.

The way we define the window sizes is equivalent to placing each element independently into a random bucket, and letting be the number of elements in bucket . Hence the distribution of window sizes is correct. Conditioned on the window sizes, the assignment of elements into windows is determined by a random permutation; any partition is equally likely. Therefore the distribution of elements into windows is equivalent to placing each each element into a random window independently. ∎

###### Definition A.1.

Let be the mapping of to their window indices; i.e. if is in the -th window, then . A partition is -compatible if the algorithm produces history when streaming the first windows partitioned by .

###### Lemma 2.2.

Fix a history . For any element , and any , we have

###### Proof.

Let be any element of and choose any . For each -compatible partition with , we begin by showing that we can create another -compatible partition by setting and all other values of equal to . In other words, any -compatible partition where is in window can be mapped to another where is in window .

Observe that because , must not have been chosen in windows or by the algorithm so far. There are two possible reasons for this: first, windows or could be greater than (further in the future) or equal to window , in which case window trivially does not affect . Second, consider the case when windows or is less than . If this is the case, then element already arrived in the stream but was not selected by the algorithm in any solution. Hence was either never the maximum element found in line 10, or if it was, its marginal value was not sufficient to replace the current solution. In either case, removing or adding to windows and will not change the history : if a different element was chosen for the update in window , this will still be the case; and if no update occurred, this will also still be the case. Finally, observe that the pool of elements for re-insertion will also not be affected, since element was not part of it.

Thus, we may change from to and maintain a -compatible partition. Since is equal to everywhere except on , this maps each such partition to a unique partition (and vice versa), establishing a bijectiion between -compatible partitions with and . Consequently, the number of partitions with compatible with is equal to the number of partitions with .

Let be the set of indices where there exists -compatible partitions with . The argument above applies to any windows . In particular, contains all windows greater than or equal to , since these windows clearly do not affect . For any element , and , we have

 Pr(e∈wj∣Hi−1)=#partitions P with P(e)=j and P is % Hi−1-compatible#partitions P where P is Hi−1-compatible=#partitions P with P(e)=j′ and P is % Hi−1-compatible#partitions P where P is Hi−1-compatible=Pr(e∈wj′∣Hi−1).

The first and last lines follow from Lemma 2.1, since any partition happens with uniform probability.

Any element must appear in some window, and it is equally likely to be in any of the windows where it could be present without affecting the history . So we have

 1=αk∑s=1Pr% (e∈ws∣Hi−1)=|Je|⋅Pr(e∈wi∣Hi−1)

The lemma follows from noting that . ∎

###### Proof.

By the definition of an active set, for any and . For , these elements are reintroduced by the algorithm with probability . Hence, for any , without conditioning on . Since the input permutation is uniformly random, and are independent for in any window . Letting , we have

 Pr(1i)=1−(1−1αk)k=1/α−1/(2α2)+O(1/α3)

for large enough and . Thus,

 Eαβ∑i=11i=αβ⋅E11=(1−1/(2α)+O(1/α2))β.

Next, note that and are negatively dependent: conditioning on a window being active decreases the number of optimal elements available to the other windows (and conditioning on a window not being active increases the number available to other windows). Thus we have:

 Var(∑1i)≤αβ(E11−E112)=(1−3/(2α)+O(1/α2))β.

Now can apply the lower-tail bound Hoeffding’s inequality ([9], Theorem 1.10.12) to get

 Pr(αβ∑i=11i≤−c√βlog1δ+Eαβ∑i=11i)≤exp⎛⎝−c2log1δ3⎞⎠≤δ/2

for a large enough constant .

Similarly, we may apply the upper-tail bound of Hoeffding’s inequality [9] (Corollary 1.10.13), to obtain:

 Pr(αβ∑i=11i≤c√βlog1δ+Eαβ∑i=11i)≤δ/2.

for a large enough constant .

Since , the number of active windows in the first windows is at least and at most with probability at least . ∎

###### Lemma 2.3.

Suppose we have streamed up to the -th window of the input for some . Then expected number of active windows seen so far satisfies

 ¯Zαβ:=expected number of active windows=β−Θ(β/α).

Furthermore, the actual number of windows concentrates around to within with probability .

###### Lemma 2.4.

Let where and are the values of and defined in Algorithm 2 on window . Conditioned on a history and window being active,

 ∑ℓ∈¯Li+1E[f(Li+1ℓ+1)−f(Liℓ)∣Hi,Ai+1]≥1k∑ℓ∈¯Li+1(f(O)−E[f(Liℓ)∣Hi]). (3)
###### Proof.

As in the previous lemma, we first note that by the construction of the active set , any appears in with probability exactly , so for any . Also, the appearances of different elements are mutually independent. In particular, we have .

Order the ’s so that for . Given that for some random , we have

 ∑ℓ∈¯Li+1E[f(Li+1ℓ+1)−f(Liℓ) ∣Hi,Ai+1]≥maxe∈Ai+1∑ℓ∈¯Li+1E[f(e|Liℓ)∣Hi,Ai+1] ≥k∑j=1p(1−p)j−11−(1−p)k∑ℓ∈¯Li+1E[f(oj|Liℓ)∣Hi,Ai+1] ≥∑ℓ∈¯Li+1E[1kk∑j=1f(oj|Liℓ)∣Hi,Ai+1] ≥∑ℓ∈¯Li+1E[1k(f(O∪Liℓ)−f(Liℓ))∣Hi,Ai+1] (Submodularity) ≥∑ℓ∈¯Li+1E[1k(f(O)−f(Liℓ))∣Hi,Ai+1] (Monotonicity) ≥∑ℓ∈¯Li+1E[1k(f(O)−f(Liℓ))∣Hi]. (4)

The first line follows from the fact that , since contains and the entirety of . The numerator of follows from the fact that if is the maximum, then no elements in valued higher than it can appear in . The denominator is the probability that window is active. (All events are conditioned on The third line is subtle and follows from Chebyshev’s sum inequality. Let and