1 Introduction
Over the past few decades, submodularity has become recognized as a useful property occurring in a wide variety of discrete optimization problems. Submodular functions model the property of diminishing returns, whereby the gain in a utility function decreases as the set of items considered increases. This property occurs naturally in machine learning, information retrieval, and influence maximization, to name a few (see
Iyer et al. (2020) and the references within).In many settings, the data is not available in a randomaccess model; either for external reasons (customers arriving online) or because of the massive amount of data. When the data becomes too big to store in memory, we look for streaming algorithms that pass (once) through the data, and efficiently decide for each element whether to discard it or keep it in a small buffer in memory.
In this work, we consider algorithms that process elements arriving in a random order. Note that the classical greedy algorithm which iteratively adds the best element Nemhauser et al. (1978) cannot be used here, and hence we must look for new algorithmic techniques. The motivation for considering random element arrival comes from the prevalence of submodularity in bigdata applications, in which data is often logged in batches that can be modelled as random samples from an underlying distribution.^{1}^{1}1Note that the random order assumption generalizes the typical assumption of i.i.d. sampling stream elements from a known distribution, as the random order assumption applies equally well to unknown distributions. The problem of streaming submodular maximization has recently received significant attention both for random arrival order and the more pessimistic worstcase order arrival Agrawal et al. (2019); Alaluf et al. (2020); Alaluf and Feldman (2019); Badanidiyuru et al. (2014); Badanidiyuru and Vondrák (2014); Chekuri et al. (2015); Feldman et al. (2018, 2020); Huang et al. (2020); Indyk and Vakilian (2019); Kazemi et al. (2019); McGregor and Vu (2019); NorouziFard et al. (2018); Huang et al. (2020); Shadravan (2020).
1.1 Submodular functions and the streaming model
Let be a nonnegative set function satisfying for all . Such a function is called submodular. For simplicity, we assume .^{2}^{2}2Submodular functions with only improves the approximation ratio of our algorithms. We use the shorthand to denote the marginal of on top of . When for all , is called monotone. We consider the following optimization problem:
where is a cardinality constraint for the solution size.
Our focus on submodular maximization in the streaming setting. In this setting, an algorithm is given a single pass over a dataset in a streaming fashion, where the stream is a some permutation of the input dataset and each element is seen once. The stream is in random order when the permutation is uniformly random. When there are no constraints on the stream order, we call the stream adversarial.
At each step of the stream, our algorithm is allowed to maintain a buffer of input elements. When a new element is streamed in, the algorithm can choose to add the element to its buffer. To be as general as possible, we assume an oracle model — that is, we assume there is an oracle that returns the value of for any set . The decision of the algorithm to add an element is based only on queries to the oracle on subsets of the buffer elements. The algorithm may also choose to throw away buffered elements at any given time. The goal in the streaming model is to minimize memory use, and the complexity of the algorithm is the maximum number of input elements the algorithm stores in the buffer at any given time.
For the oracle model, an important distinction is between weak oracle access or and strong oracle access. In the weak oracle setting, the algorithm is only allowed to query sets of feasible elements (sets that have cardinality less than ). In the strong oracle setting however, the algorithm is allowed to query any set of elements. All our results apply to both the weak and strong oracle models.
Our aim will be to develop algorithms that only make one pass over the data stream, using memory, where is the maximum size of a solution set in . We assume that is small relative to , the size of the ground set.
1.2 Our contributions
On the algorithmic side, we give and approximation for cardinality constrained monotone and nonmonotone submodular maximization respectively, both using memory. The monotone result has an exponential improvement in memory requirements compared to Agrawal et al. Agrawal et al. (2019) (in terms of the dependence on ), while the nonmonotone result is the first to appear in the randomorder streaming model, and improves upon the best known polynomialtime approximations under adversarial orders Alaluf et al. (2020). The algorithms are extremely simple to implement, and perform well on real world data (see Section 5), even compared to offline greedy algorithms.
On the hardness side, we prove that a approximation for monotone submodular maximization would require memory (even with unlimited queries and computational power). This improves the hardness bound of from Agrawal et al. (2019).
1.3 Related work
Prior work on this problem has focused on both the adversarial and randomorder streaming setting. Algorithmic and hardness results further depend on whether the function is monotone or nonmonotone, and whether has explicit structure (e.g. such as by presenting a set system for in the coverage case), or accessible only via oracle queries. Table 1 describes all the relevant results.
adversarial order  random order  

monotone 



nonmono. 


Algorithmic results.
Submodular maximization in the streaming setting was first considered by Badanidiyuru et al. Badanidiyuru et al. (2014) who gave a approximation in memory for monotone submodular functions under a cardinality constraint, using a thresholding idea with parallel enumeration of possible thresholds. This work led to a number of subsequent developments with the current best being a approximation in memory Kazemi et al. (2019). It turns out that the factor of is the best possible in the adversarial setting (with a weak oracle), but an improvement is possible in the random order model (the input is ordered by a uniformly random permutation). This was first shown by NorouziFard et al. NorouziFard et al. (2018), who proved that the hardness barrier for cardinality constraints can be broken, exhibiting a approximation in memory where . In a breakthrough work, Agrawal, Shadravan, and Stein Agrawal et al. (2019) gave a approximation using memory and running time. We note that this is arbitrarily close to the optimal factor of , but the algorithm is not practical, due to its dependence on (even for , the resulting constants are astronomical).
Lower bounds.
A few lower bounds are known for monotone functions in the adversarial order model: with a weak oracle, any approximation would require memory NorouziFard et al. (2018). Under a strong oracle, a lower bound of memory for any approximation algorithm was shown in a recent paper by Feldman et al. Feldman et al. (2020). Another recent lower bound was proved by McGregor and Vu McGregor and Vu (2019): a approximation for coverage functions requires memory (this lower bound holds for explicitly given inputs, via communication complexity; we note that this is incomparable to the computational hardness of maximum coverage Feige (1998)). For nonmonotone functions, Alaluf et al. Alaluf et al. (2020) proved an memory lower bound for the adversarial order model with unbounded computation.
In the randomorder model, Agrawal et al. Agrawal et al. (2019) show that beating (for monotone submodular functions) requires memory. In contrast, we show that same construction as McGregor and Vu McGregor and Vu (2019) also applies to randomly ordered streams: for coverage functions requires memory even in the randomorder model.
Submodular maximization in related models.
A closely related model is the secretary with shortlists model Agrawal et al. (2019), where an algorithm is allowed to store a shortlist of more than items (where is the cardinality constraint). Unlike the streaming model however, once an element goes into the shortlist, it cannot be removed. Then, after seeing the entire stream, the algorithm chooses a subset of size from the shortlist and returns that to the user. We note that the algorithms developed in this paper apply almost without modification to the shortlists model.
1.4 Overview of our techniques
Main algorithmic techniques.
The primary impetus for our algorithmic work was an effort to avoid the extensive enumeration involved in the algorithm of Agrawal et al. Agrawal et al. (2019) which leads to memory requirements exponential in .
To make things concrete, let us consider the input divided into disjoint windows of consecutive elements. The windows containing actual optimal elements play a special role — let’s call them active windows — these are the windows where we make quantifiable progress. When the stream is randomly ordered, we would ideally like to have each new element sampled independently and uniformly from the input. This leads to the intuition that the optimal elements are evenly spread out through all the windows. This cannot be literally true, since conditioned on the history of the stream, some elements have already appeared and cannot appear again. However, a key idea of Agrawal et al. Agrawal et al. (2019) allows us to circumvent this by reinserting the elements that we have already seen and that played a role in the selection process. What needs to be proved is that elements that were not selected can still appear in the future, conditioned on the history of the algorithm; that turns out to be true, provided that our algorithm operates in a certain greedylike manner.
To ensure progress was made regardless of the positioning of the optimal elements, previous work made use of exponentially large enumerations to essentially guess which windows the optimal elements arrive in. Where we depart from previous work is the way we build our solution. The idea is to use an evolving family of solutions which are updated in parallel, so that we obtain a quantifiable gain regardless of where the optimal elements arrived. Specifically, we grow solutions in parallel, where solution has cardinality . In each window, we attempt to extend a collection of solutions (for varying ) by a new element ; if is beneficial on average to every in the collection, we replace each with the new solution . Regardless of which windows happen to be active, we will show that the average gain over our evolving collection of solutions is analogous to the greedy algorithm. This is the basis of the analysis that leads to a factor of .
In addition to our candidate solutions , we maintain a pool of elements that our algorithm has ever included in some candidate solution. We then use
to reintroduce elements artificially back into the input; this makes it possible to assume that every input element still appears in a future window with the same probability, which is key to the probabilistic analysis leading to
.Nonmonotone functions.
Our algorithm for nonmonotone submodular functions is similar, with the caveat that here we also have to be careful about not including any element in the solution with large probability. This is an important aspect of the randomized greedy algorithm for (offline) nonmonotone submodular maximization Buchbinder et al. (2014) which randomly includes in each step one of the top elements in terms of marginal values. We achieve a similar property by choosing the top element from the current window and a random subset of the pool of prior elements .
Hardness results.
Our hardness instances have the following general structure: there is a special subset of good elements, and the remaining elements are bad. The good elements are indistinguishable from each other, and ditto for the bad elements. In the monotone case, any bad elements are a factor worse than the optimal solution ( good elements). Suppose furthermore that for parameter , as long as we never query the function on a subset with good elements, the good elements are indistinguishable from bad elements. The only way to collect good elements in the memory buffer is by chance – until we’ve collected the required number of good elements, they are indistinguishable from bad elements, so the subset in the memory buffer is random. The classic work of Nemhauser and Wolsey Nemhauser and Wolsey (1978) constructs a pathological monotone submodular function with , which we use to prove that without memory the algorithm cannot beat . McGregor and Vu McGregor and Vu (2019) construct a simple example of a coverage function with , which we use for our bound. For an exponentialsize ground set, we extend their construction to which translates to the improved lower bound of .
2 A approximation in memory
In this section, we develop a simple algorithm for optimizing a monotone submodular function with respect to a cardinality constraint. For the sake of exposition, we focus on the intuition behind the results and relegate full proofs to the appendix.
Our algorithm begins by randomly partitioning the stream into contiguous windows of expected size , where is a parameter controlling the memory dependence and approximation ratio. This is done by generating a random partition according to Algorithm 1. As the algorithm progresses, it maintains partial solutions, the th of which contains exactly elements. Within each window we process all the elements independently, and choose one candidate element to extend the partial solutions by. We then add to a collection of partial solutions at the end of the window. The range of partial solution sizes that we use roughly tracks the number of optimal elements we are expected to have seen so far in the stream.
Intuitively, our algorithm is guaranteed to make progress on windows that contain an element from the optimal solution . Let us loosely call such windows active (a precise definition will be given later). Of course, the algorithm never knows which windows are active. However, the key idea of our analysis is that we are able to track the progress that our algorithm makes on active windows. Since the input stream is uniformly random, intuitively we expect to see optimal elements after processing windows. With high probability, the true number of optimal elements seen will be in the range . By focusing on the average improvement over levels in , we can show that each level in this range gains in expectation, whenever a (random) optimal element arrives.
For the analysis to work, ideally we would like each arriving optimal element to be selected uniformly among all optimal elements. This is not true conditioned on the history of decisions made by the algorithm. However, we can remedy this by reinserting elements that we have selected before and subsampling the elements in the current window, with certain probabilities. A key lemma (Lemma 2.2) shows why this works, since the elements we have never included might still appear, given the history of the algorithm. Our basic algorithm is described in Algorithm 2, with the window partitioning procedure described in Algorithm 1.
In its most basic implementation, Algorithm 2 requires memory (to store and ’s for ). However, there are several optimizations we can make. Algorithm 2 can be implemented in a way that the ’s are not directly stored at all. To avoid storing the ’s, we can augment to contain not just , but also the index of the window it was added in. The index of the window tells us the range of levels that was inserted into, so all of the ’s can be reconstructed from as contains a history of all the insertions. Thus the memory use of Algorithm 2 is the size of at the end of the stream. Since there are windows and each window introduces at most element to , we have the following observation:
Observation 2.1.
Algorithm 2 uses at most space and time.
When is streamed in random order, our partitioning procedure (Algorithm 1) has a much simpler interpretation. (A similar lemma can be found in the appendix of Agrawal et al. Agrawal et al. (2019).)
Lemma 2.1.
Suppose is streamed according to a permutation chosen at random and we partition by Algorithm 1 into windows. This is equivalent to assigning each to one of different buckets uniformly and independently at random.
The algorithm’s performance and behavior depends on the ordering of . Let us define the history of the algorithm up to the th window, denoted , to be the sequence of all solutions produced up to that point. (Note that this history is only used in the analysis.) More precisely, we define the history as follows.
Definition 2.1.
Let denote the state of the set maintained by the algorithm, before processing window . We define to be the set of all triples such that element was added to solution in window . In other words, contains all of the changes that the algorithm made to its state while processing the first windows. For convenience, sometimes we treat as a set of elements and say that if .
The history describes the entire memory state of the algorithm up to the end of window . In the following, we analyze the performance of the algorithm in the th window conditioned the history . Note that different random permutations of the input may produce this history, and we average over all of them in the analysis.
The next key lemma captures the intuition that elements not selected by the algorithm so far could still appear in the future, and bounds the probability with which this happens.
Lemma 2.2.
Fix a history . For any element , and any , we have
Next, we define a set of active windows. In each active window, the algorithm is expected to make significant improvements to its candidate solution. The active windows will only be used in the analysis of the algorithm, and need not be computed in any way.
Definition 2.2.
Let be the optimal solution. For window , let be the probability that given . Define its active set to be the union of and the set obtained by sampling each with probability . We call an active window if and we call the active optimal elements of window .
Note that the construction of active sets in Definition 2.2 is valid as Lemma 2.2 guarantees . More importantly, the active window subsamples the optimal elements so that each element appears in with probability exactly regardless of the history . This allows us to tightly bound the number of active windows in the input, as we show in the next lemma.
Lemma 2.3.
Suppose we have streamed up to the th window of the input for some . Then expected number of active windows seen so far satisfies
Furthermore, the actual number of windows concentrates around to within with probability .
Next we analyze the expected gain in the solution after processing each active window. Let the event that window is active.
Lemma 2.4.
Let where and are the values of and defined in Algorithm 2 on window . Conditioned on a history and window being active,
(1) 
Under an ideal partition, each element of appears in a different window, with one optimal element appearing roughly once every windows. Thus after active windows, we expect to obtain a approximation (as in the standard greedy analysis).
Theorem 2.5.
The expected value of the best solution found by the algorithm is at least
Setting , we have a approximation using memory.
3 A approximation for nonmonotone submodular maximization
In this section, we show that the basic algorithm described in Algorithm 2 can be altered to give a approximation to the cardinality constrained nonmonotone case (Algorithm 3).
Algorithm 3 uses the same kind of multilevel scheme as Algorithm 2. However, Algorithm 3 further subsamples the elements of the input so that the probability of including any element is exactly lines 7–13 (coloured in orange). The subsampling allows us to bound the maximum probability that an element of the input is included in the solution. In particular, the subsampling is done by having the algorithm compute (on the fly) the conditional probability that an element could have been selected had it appeared in the past. This gives us the ability to compute an appropriate subsampling probability to ensure that does not appear in with too high a probability. In terms of the proof, the subsampling allows us to perform a similar analysis to the RandomGreedy algorithm of Buchbinder et al. Buchbinder et al. (2014).^{3}^{3}3A difference here is that instead of analysing a random element of the top marginals, we analyse the optimal set directly.
Since many of the main ideas are the same, we relegate the details of the analysis to the appendix.
Implementation of Algorithm 3
For clarity of exposition, we compute up front in line 4. However, we can compute them on the fly in practice since each element only uses its value of once (lines 10 and 12). This avoids an memory cost associated with storing each . Finally, we assume that there are no ties when computing the best candidiate element in each window. Ties can be handled by any arbitrary but consistent tiebreaking procedure. Any additional information used to break the ties (for example an ordering on the elements ) must be stored alongside for the computation of (line 10).
Theorem 3.1.
Algorithm 3 obtains a approximation for maximizing a nonmonotone function with respect to a cardinality constraint in memory.
We remark that Algorithm 3 also achieves a guarantee of for the monotone case, as Lemma 2.4 and Theorem 2.5 both still apply to Algorithm 3 when is monotone. The main difference between the two is the subsampling (lines 7–13), which increases the running time of the algorithm.
4 hardness for monotone submodular maximization
The proofs of the following propositions and lemmas may be found in the appendix.
Proposition 4.1.
Fix subsets of elements (denoting “good” and “bad”) such that and ; let be some parameter. Let denote the size of the memory buffer, and let denote the probability that a random subset of size contains at least good elements. Let be a function that satisfies the following symmetries:

is symmetric over good (resp. bad) elements, namely there exists such that

For any set with good elements, does not distinguish between good and bad elements, namely for ,
Then any algorithm has expected value at most
(2) 
We now consider a few different ’s that satisfy the desiderata of Proposition 4.1.
Lemma 4.1 (monotone submodular function Nemhauser and Wolsey (1978)).
There exists a monotone submodular that satisfies the desiderata of Proposition 4.1 for , and such that:

.

.
Lemma 4.2 (polynomialuniverse coverage function McGregor and Vu (2019)).
There exists a (monotone submodular) coverage function over a polynomial universe that satisfies the desiderata of Proposition 4.1 for , and such that:

.

.
Lemma 4.3 (exponentialuniverse coverage function (new construction)).
There exists a (monotone submodular) coverage function over an exponential universe that satisfies the desiderata of Proposition 4.1 for , and such that:

.

.
Our main hardness result follows from the lemmas above:
Theorem 4.4.
Any approximation algorithm in the random order strong oracle model must use the following memory:

for a general monotone submodular function.

for a coverage function over a polynomial universe.

for a coverage function over an exponential universe.
5 Experimental results
In the following section, we give experimental results for our monotone streaming algorithm. Due to space limitations, the experiments for the nonmonotone algorithm can be found in the appendix. Our main goal is to show that our algorithm performs well in a practical setting and is simple to implement. In fact, we show that our algorithm is on par with offline algorithms in performance, and returns competitive solutions across a variety of datasets. All experiments were performed on a 2.7 GHz dualcore Intel i7 CPU with 16 GB of RAM.
We compare the approximation ratios obtained by our algorithm with three benchmarks:

The offline LazyGreedy algorithm Minoux (1978), which is both theoretically optimal and obtains the same solution as greedy (in faster time). Note that we don’t expect to outperform it with a streaming algorithm; but as we hoped, our algorithm comes close.

The SieveStreaming algorithm of Badanidiyuru et al. Badanidiyuru et al. (2014), which is the first algorithm to appear for adversarial streaming submodular optimization.

The Salsa algorithm of NorouziFard et al. NorouziFard et al. (2018), which is the first “beyond ” approximation algorithm for randomorder streams. This algorithm runs several varients of SieveStreaming in parallel with thresholds that change as the algorithm progresses through the stream. ^{4}^{4}4SieveStreaming is also known as threshold greedy in the literature Badanidiyuru and Vondrák (2014). Note that the later SieveStreaming++ algorithm of Kazemi et al. (2019) is more efficient, but for approximation ratio SieveStreaming is a stronger benchmark. for adversarial order streaming. As we would expect, our algorithm performs better on random arrival streams.
Note that in terms of memory use, our algorithm is strictly more efficient. The analysis in previous sections show that the memory is (with a small constant), versus for both SieveStreaming and Salsa. Thus in the experiments below, we focus on the approximation ratio obtained by our algorithm.
on each data set (averaged across 10 runs, shaded regions represent variance across different random orderings).
Datasets
Our datasets are drawn from set coverage instances from the 2003 and 2004 workshops on Frequent Itemset Mining Implementations on Data Mining (2003) and the Steiner triple instances of Beasley Beasley (1987). For each data set we run the three algorithms for cardinality constraints varying from to . The results of the algorithms are averaged across 10 random stream orderings. Table 2 describes the data sources. Figure 1 shows the performance of the three algorithms on each data set. All code can be found at https://github.com/whereispaul/submodularstreaming and all datasets can be found at https://tinyurl.com/neurips21.
dataset  source  # of sets  universe size 

accidents  (anonymized) traffic accident data  340183  468 
chess  UCI ML Repository  3196  75 
connect  UCI ML Repository  67557  129 
kosarak  (anonymized) clickstream data  990002  41270 
mushroom  UCI ML Repository  8124  119 
pumsb  census data for population and housing  49046  7116 
pumsb_star  census data for population and housing  49046  7116 
retail  (anonymized) retail market basket data  88162  16469 
T40I10D100K  generator from IBM Quest research  100000  999 
6 Conclusion and Future Work
In this work, we have presented memoryoptimal algorithms for the problem maximizing submodular functions with respect to cardinality constraints in the random order streaming model. Our algorithms achieve an optimal approximation factor of for the monotone submodular case, and an approximation factor of for the nonmonotone case. In addition to theoretical guarantees, we show that the algorithm outperforms existing stateoftheart on a variety of datasets.
We close with a few open questions that would make for interesting future work. Although our algorithm is memoryoptimal, it is not runtimeoptimal. In particular, the SieveStreaming Badanidiyuru et al. (2014) and Salsa NorouziFard et al. (2018) algorithms both run in time , whereas our algorithm runs in time . The nonmonotone variant of our algorithm runs even slower, as it needs to perform subsampling operations that take at least per stream element in its current form. Improving this runtime would greatly improve the practicality of our algorithm for extremely large cardinality constraints. Finally, there has been recent interest in examining streaming algorithms for streams under “adversarial injection” Garg et al. (2020). In such streams, the optimal elements of the stream are randomly ordered, while adversarial elements can be injected between the optimal elements with no constraints. Despite the seemingly large power of the adversary, the approximation barrier can still be broken in this model. It would be interesting to see if the work in this paper can be extended to such a setting.
The authors are indebted to Mohammad Shadravan and Morteza Monemizadeh for their insightful discussions that no doubt improved this work. The first author is supported by a VMWare Fellowship and the Natural Sciences and Engineering Research Council of Canada. The second and fourth authors are supported by NSF CCF1954927, and the second author is additionally supported by a David and Lucile Packard Fellowship.
References
 [1] (2019) Submodular Secretary Problem with Shortlists. In 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 1012, 2019, San Diego, California, USA, LIPIcs, Vol. 124, pp. 1:1–1:19. External Links: Link, Document Cited by: Remark A.1, Cardinality constrained submodular maximization for random streams, §1.2, §1.2, §1.3, §1.3, §1.3, §1.4, §1.4, Table 1, §1, §2.
 [2] (202002) Optimal Streaming Algorithms for Submodular Maximization with Cardinality Constraints. arXiv:1911.12959 [cs]. Note: arXiv: 1911.12959 External Links: Link Cited by: §1.2, §1.3, Table 1, §1.
 [3] (201906) Making a Sieve Random: Improved SemiStreaming Algorithm for Submodular Maximization under a Cardinality Constraint. arXiv:1906.11237 [cs]. Note: arXiv: 1906.11237 External Links: Link Cited by: §1.
 [4] (2014) Streaming Submodular Maximization: Massive Data Summarization on the Fly. In The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA  August 24  27, 2014, S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani (Eds.), pp. 671–680. External Links: Link, Document Cited by: §1.3, §1, 2nd item, §6.
 [5] (2014) Fast Algorithms for Maximizing Submodular Functions. In Proceedings of the TwentyFifth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 57, 2014, C. Chekuri (Ed.), pp. 1497–1514. External Links: Link, Document Cited by: §1, footnote 4.
 [6] (1987) An algorithm for set covering problem. European Journal of Operational Research 31 (1), pp. 85 – 93. External Links: ISSN 03772217, Document, Link Cited by: §5.
 [7] (2014) Submodular Maximization with Cardinality Constraints. In Proceedings of the TwentyFifth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 57, 2014, C. Chekuri (Ed.), pp. 1433–1452. External Links: Link, Document Cited by: Appendix C, Appendix C, Lemma C.3, Appendix C, Appendix D, §1.4, §3.
 [8] (2015) Streaming algorithms for submodular function maximization. In Automata, Languages, and Programming  42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 610, 2015, Proceedings, Part I, M. M. Halldórsson, K. Iwama, N. Kobayashi, and B. Speckmann (Eds.), Lecture Notes in Computer Science, Vol. 9134, pp. 318–330. External Links: Link, Document Cited by: §1.

[9]
(2020)
Probabilistic Tools for the Analysis of Randomized Optimization Heuristics
. arXiv:1801.06733 [cs, math], pp. 1–87. Note: arXiv: 1801.06733Comment: 91 pages External Links: Link, Document Cited by: Appendix A, Appendix A.  [10] (1998) A Threshold of ln n for Approximating Set Cover. J. ACM 45 (4), pp. 634–652. External Links: Link, Document Cited by: §1.3.
 [11] (2018) Do less, get more: streaming submodular maximization with subsampling. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett (Eds.), pp. 730–740. External Links: Link Cited by: Table 1, §1.

[12]
(2020)
The Oneway Communication Complexity of Submodular Maximization with Applications to Streaming and Robustness.
In
Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2020, Chicago, IL, USA, June 2226, 2020
, K. Makarychev, Y. Makarychev, M. Tulsiani, G. Kamath, and J. Chuzhoy (Eds.), pp. 1363–1374. External Links: Link, Document Cited by: §1.3, Table 1, §1.  [13] (2020) Robust algorithms under adversarial injections. In 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 811, 2020, Saarbrücken, Germany (Virtual Conference), A. Czumaj, A. Dawar, and E. Merelli (Eds.), LIPIcs, Vol. 168, pp. 56:1–56:15. External Links: Link, Document Cited by: §6.
 [14] (202002) Approximability of Monotone Submodular Function Maximization under Cardinality and Matroid Constraints in the Streaming Model. arXiv:2002.05477 [cs] (en). Note: arXiv: 2002.05477 External Links: Link Cited by: Table 1, §1.

[15]
(2020)
Improved multipass streaming algorithms for submodular maximization with matroid constraints.
In
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2020, August 1719, 2020, Virtual Conference
, J. Byrka and R. Meka (Eds.), LIPIcs, Vol. 176, pp. 62:1–62:19. External Links: Link, Document Cited by: §1.  [16] (2019) Tight Tradeoffs for the Maximum kCoverage Problem in the General Streaming Model. In Proceedings of the 38th ACM SIGMODSIGACTSIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30  July 5, 2019, D. Suciu, S. Skritek, and C. Koch (Eds.), pp. 200–217. External Links: Link, Document Cited by: §1.
 [17] (2020) Submodular Combinatorial Information Measures with Applications in machine learning. CoRR abs/2006.15412. External Links: Link, 2006.15412 Cited by: §1.
 [18] (2019) Submodular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 915 June 2019, Long Beach, California, USA, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, pp. 3311–3320. Cited by: §1.3, Table 1, §1, footnote 4.
 [19] (2021) Diversity on the go! streaming determinantal point processes under a maximum induced cardinality objective. In The Web Conference 2021, TheWebConf 2021, Cited by: Appendix D.
 [20] (2019) Better Streaming Algorithms for the Maximum Coverage Problem. Theory Comput. Syst. 63 (7), pp. 1595–1619. External Links: Link, Document Cited by: §1.3, §1.3, §1.4, §1, Lemma 4.2.
 [21] (1978) Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pp. 234–243. Cited by: 1st item.
 [22] (1978) Best algorithms for approximating the maximum of a submodular set function. Mathematics of operations research 3 (3), pp. 177–188. Cited by: §1.4, Lemma 4.1.
 [23] (1978) An Analysis of Approximations for Maximizing Submodular Set Functions  I. Math. Program. 14 (1), pp. 265–294. External Links: Link, Document Cited by: §1.
 [24] (2018) Beyond 1/2approximation for submodular maximization on massive data streams. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018, J. G. Dy and A. Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, pp. 3826–3835. External Links: Link Cited by: §1.3, §1.3, Table 1, §1, 3rd item, §6.
 [25] (2003 and 2004) Workshops on Frequent Itemset Mining Implementations. Note: http://fimi.uantwerpen.be/data/ Cited by: §5.
 [26] (2020) Improved submodular secretary problem with shortlists. CoRR abs/2010.01901. External Links: Link, 2010.01901 Cited by: §1.
Appendix
Appendix A Missing proofs from Section 2
We begin with a simple lemma showing that the values of the levels are monotone:
Lemma A.1.
For all and , and .
Proof.
First, we note that the second part of the lemma holds by lines 15–16. Let and be the value of and in Algorithm 2 on line 9 on window . Consider a window . There are two cases, depending on whether an element was added to the solutions or not. Suppose no element was added to the solution. Then all the levels remain the same. Line 15 guarantees that . Since no elements were added, so for every level . Now suppose an element was added in window . For levels and , , so . For levels , at the end of line 13, and its value only improves through lines 15–16. Thus . ∎
The rest of the proofs below correspond directly to unproven lemmas in Section 2.
Lemma 2.1.
Suppose is streamed according to a permutation chosen at random and we partition by Algorithm 1 into windows. This is equivalent to assigning each to one of different buckets uniformly and independently at random.
Proof.
The way we define the window sizes is equivalent to placing each element independently into a random bucket, and letting be the number of elements in bucket . Hence the distribution of window sizes is correct. Conditioned on the window sizes, the assignment of elements into windows is determined by a random permutation; any partition is equally likely. Therefore the distribution of elements into windows is equivalent to placing each each element into a random window independently. ∎
Definition A.1.
Let be the mapping of to their window indices; i.e. if is in the th window, then . A partition is compatible if the algorithm produces history when streaming the first windows partitioned by .
Lemma 2.2.
Fix a history . For any element , and any , we have
Proof.
Let be any element of and choose any . For each compatible partition with , we begin by showing that we can create another compatible partition by setting and all other values of equal to . In other words, any compatible partition where is in window can be mapped to another where is in window .
Observe that because , must not have been chosen in windows or by the algorithm so far. There are two possible reasons for this: first, windows or could be greater than (further in the future) or equal to window , in which case window trivially does not affect . Second, consider the case when windows or is less than . If this is the case, then element already arrived in the stream but was not selected by the algorithm in any solution. Hence was either never the maximum element found in line 10, or if it was, its marginal value was not sufficient to replace the current solution. In either case, removing or adding to windows and will not change the history : if a different element was chosen for the update in window , this will still be the case; and if no update occurred, this will also still be the case. Finally, observe that the pool of elements for reinsertion will also not be affected, since element was not part of it.
Thus, we may change from to and maintain a compatible partition. Since is equal to everywhere except on , this maps each such partition to a unique partition (and vice versa), establishing a bijectiion between compatible partitions with and . Consequently, the number of partitions with compatible with is equal to the number of partitions with .
Let be the set of indices where there exists compatible partitions with . The argument above applies to any windows . In particular, contains all windows greater than or equal to , since these windows clearly do not affect . For any element , and , we have
The first and last lines follow from Lemma 2.1, since any partition happens with uniform probability.
Any element must appear in some window, and it is equally likely to be in any of the windows where it could be present without affecting the history . So we have
The lemma follows from noting that . ∎
Proof.
By the definition of an active set, for any and . For , these elements are reintroduced by the algorithm with probability . Hence, for any , without conditioning on . Since the input permutation is uniformly random, and are independent for in any window . Letting , we have
for large enough and . Thus,
Next, note that and are negatively dependent: conditioning on a window being active decreases the number of optimal elements available to the other windows (and conditioning on a window not being active increases the number available to other windows). Thus we have:
Now can apply the lowertail bound Hoeffding’s inequality ([9], Theorem 1.10.12) to get
for a large enough constant .
Similarly, we may apply the uppertail bound of Hoeffding’s inequality [9] (Corollary 1.10.13), to obtain:
for a large enough constant .
Since , the number of active windows in the first windows is at least and at most with probability at least . ∎
Lemma 2.3.
Suppose we have streamed up to the th window of the input for some . Then expected number of active windows seen so far satisfies
Furthermore, the actual number of windows concentrates around to within with probability .
Lemma 2.4.
Let where and are the values of and defined in Algorithm 2 on window . Conditioned on a history and window being active,
(3) 
Proof.
As in the previous lemma, we first note that by the construction of the active set , any appears in with probability exactly , so for any . Also, the appearances of different elements are mutually independent. In particular, we have .
Order the ’s so that for . Given that for some random , we have
(Submodularity)  
(Monotonicity)  
(4) 
The first line follows from the fact that , since contains and the entirety of . The numerator of follows from the fact that if is the maximum, then no elements in valued higher than it can appear in . The denominator is the probability that window is active. (All events are conditioned on The third line is subtle and follows from Chebyshev’s sum inequality. Let and