Log In Sign Up

Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach

by   Slobodan Mitrović, et al.

We study the classical problem of maximizing a monotone submodular function subject to a cardinality constraint k, with two additional twists: (i) elements arrive in a streaming fashion, and (ii) m items from the algorithm's memory are removed after the stream is finished. We develop a robust submodular algorithm STAR-T. It is based on a novel partitioning structure and an exponentially decreasing thresholding rule. STAR-T makes one pass over the data and retains a short but robust summary. We show that after the removal of any m elements from the obtained summary, a simple greedy algorithm STAR-T-GREEDY that runs on the remaining elements achieves a constant-factor approximation guarantee. In two different data summarization tasks, we demonstrate that it matches or outperforms existing greedy and streaming methods, even if they are allowed the benefit of knowing the removed subset in advance.


page 1

page 2

page 3

page 4


Robust Submodular Maximization: A Non-Uniform Partitioning Approach

We study the problem of maximizing a monotone submodular function subjec...

Stream Clipper: Scalable Submodular Maximization on Stream

Applying submodular maximization in the streaming setting is nontrivial ...

Streaming Submodular Maximization with Matroid and Matching Constraints

Recent progress in (semi-)streaming algorithms for monotone submodular f...

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Maximizing a submodular function is a fundamental task in machine learni...

Robust Sequence Submodular Maximization

Submodularity is an important property of set functions and has been ext...

Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to ...

"Bring Your Own Greedy"+Max: Near-Optimal 1/2-Approximations for Submodular Knapsack

The problem of selecting a small-size representative summary of a large ...

1 Introduction

A central challenge in many large-scale machine learning tasks is data summarization – the extraction of a small representative subset out of a large dataset. Applications include image and document summarization 

[1, 2], influence maximization [3], facility location [4], exemplar-based clustering [5], recommender systems [6], and many more. Data summarization can often be formulated as the problem of maximizing a submodular set function subject to a cardinality constraint.

On small datasets, a popular algorithm is the simple greedy method [7], which produces solutions provably close to optimal. Unfortunately, it requires repeated access to all elements, which makes it infeasible for large-scale scenarios, where the entire dataset does not fit in the main memory. In this setting, streaming algorithms prove to be useful, as they make only a small number of passes over the data and use sublinear space.

In many settings, the extracted representative set is also required to be robust. That is, the objective value should degrade as little as possible when some elements of the set are removed. Such removals may arise for any number of reasons, such as failures of nodes in a network, or user preferences which the model failed to account for; they could even be adversarial in nature.

A robustness requirement is especially challenging for large datasets, where it is prohibitively expensive to reoptimize over the entire data collection in order to find replacements for the removed elements. In some applications, where data is produced so rapidly that most of it is not being stored, such a search for replacements may not be possible at all.

These requirements lead to the following two-stage setting. In the first stage, we wish to solve the robust streaming submodular maximization problem – one of finding a small representative subset of elements that is robust against any possible removal of up to elements. In the second, query stage, after an arbitrary removal of elements from the summary obtained in the first stage, the goal is to return a representative subset, of size at most , using only the precomputed summary rather than the entire dataset.

For example, (i) in dominating set problem (also studied under influence maximization) we want to efficiently (in a single pass) compute a compressed but robust set of influential users in a social network (whom we will present with free copies of a new product), (ii) in personalized movie recommendation we want to efficiently precompute a robust set of user-preferred movies. Once we discard those users who will not spread the word about our product, we should find a new set of influential users in the precomputed robust summary. Similarly, if some movies turn out not to be interesting for the user, we should still be able to provide good recommendations by only looking into our robust movie summary.


In this paper, we propose a two-stage procedure for robust submodular maximization. For the first stage, we design a streaming algorithm which makes one pass over the data and finds a summary that is robust against removal of up to elements, while containing at most elements.

In the second (query) stage, given any set of size that has been removed from the obtained summary, we use a simple greedy algorithm that runs on the remaining elements and produces a solution of size at most (without needing to access the entire dataset). We prove that this solution satisfies a constant-factor approximation guarantee.

Achieving this result requires novelty in the algorithm design as well as the analysis. Our streaming algorithm uses a structure where the constructed summary is arranged into partitions consisting of buckets whose sizes increase exponentially with the partition index. Moreover, buckets in different partitions are associated with greedy thresholds, which decrease exponentially with the partition index. Our analysis exploits and combines the properties of the described robust structure and decreasing greedy thresholding rule.

In addition to algorithmic and theoretical contributions, we also demonstrate in several practical scenarios that our procedure matches (and in some cases outperforms) the Sieve-Streaming algorithm [8] (see Section 5) – even though we allow the latter to know in advance which elements will be removed from the dataset.

2 Problem Statement

We consider a potentially large universe of elements of size equipped with a normalized monotone submodular set function defined on . We say that is monotone if for any two sets we have . The set function is said to be submodular if for any two sets and any element it holds that

We use to denote the marginal gain in the function value due to adding the elements of set to set , i.e. . We say that is normalized if .

The problem of maximizing a monotone submodular function subject to a cardinality constraint, i.e.,


has been studied extensively. It is well-known that a simple greedy algorithm (henceforth refered to as Greedy[7], which starts from an empty set and then iteratively adds the element with highest marginal gain, provides a -approximation. However, it requires repeated access to all elements of the dataset, which precludes it from use in large-scale machine learning applications.

We say that a set is robust for a parameter if, for any set such that , there is a subset of size at most such that

where is an approximation ratio. We use to denote the optimal subset of size of (i.e., after the removal of elements in ):

In this work, we are interested in solving a robust version of Problem (1) in the setting that consists of the following two stages: (i) streaming and (ii) query stage.

In the streaming stage, elements from the ground set arrive in a streaming fashion in an arbitrary order. Our goal is to design a one-pass streaming algorithm that has oracle access to and retains a small set of elements in memory. In addition, we want to be a robust summary, i.e., should both contain elements that maximize the objective value, and be robust against the removal of prespecified number of elements . In the query stage, after any set of size at most is removed from , the goal is to return a set of size at most such that is maximized.

Figure 1: Illustration of the set returned by STAR-T. It consists of partitions such that each partition contains buckets of size (up to rounding). Moreover, each partition has its corresponding threshold .

Related work. A robust, non-streaming version of Problem (1) was first introduced in [9]. In that setting, the algorithm must output a set of size which maximizes the smallest objective value guaranteed to be obtained after a set of size is removed, that is,

The work [10] provides the first constant () factor approximation result to this problem, valid for . Their solution consists of buckets of size that are constructed greedily, one after another. Recently, in [11], a centralized algorithm PRo has been proposed that achieves the same approximation result and allows for a greater robustness . PRo constructs a set that is arranged into partitions consisting of buckets whose sizes increase exponentially with the partition index. In this work, we use a similar structure for the robust set but, instead of filling the buckets greedily one after another, we place an element in the first bucket for which the gain of adding the element is above the corresponding threshold. Moreover, we introduce a novel analysis that allows us to be robust to any number of removals as long as we are allowed to use memory.

Recently, submodular streaming algorithms (e.g. [5][12] and [13]) have become a prominent option for scaling submodular optimization to large-scale machine learning applications. A popular submodular streaming algorithm Sieve-Streaming [8] solves Problem (1) by performing one pass over the data, and achieves a -approximation while storing at most elements.

Our algorithm extends the algorithmic ideas of Sieve-Streaming, such as greedy thresholding, to the robust setting. In particular, we introduce a new exponentially decreasing thresholding scheme that, together with an innovative analysis, allows us to obtain a constant-factor approximation for the robust streaming problem.

Recently, robust versions of submodular maximization have been considered in the problems of influence maximization (e.g, [3],  [14]) and budget allocation ([15]). Increased interest in interactive machine learning methods has also led to the development of interactive and adaptive submodular optimization (see e.g. [16][17]). Our procedure also contains the interactive component, as we can compute the robust summary only once and then provide different sub-summaries that correspond to multiple different removals (see Section 5.2).

Independently and concurrently with our work, [18] gave a streaming algorithm for robust submodular maximization under the cardinality constraint. Their approach provides a approximation guarantee. However, their algorithm uses memory. While the memory requirement of their method increases linearly with , in the case of our algorithm this dependence is logarithmic.

3 A Robust Two-Stage Procedure

Our approach consists of the streaming Algorithm 1, which we call Streaming Robust submodular algorithm with Partitioned Thresholding (STAR-T). This algorithm is used in the streaming stage, while Algorithm 2, which we call STAR-T-Greedy, is used in the query stage.

As the input, STAR-T requires a non-negative monotone submodular function , cardinality constraint , robustness parameter and thresholding parameter . The parameter is an -approximation to , for some to be specified later. Hence, it depends on , which is not known a priori. For the sake of clarity, we present the algorithm as if were known, and in Section 4.1 we show how can be approximated. The algorithm makes one pass over the data and outputs a set of elements that is later used in the query stage in STAR-T-Greedy.

The set (see Figure 1 for an illustration) is divided into partitions, where every partition consists of buckets . Here, is a memory parameter that depends on ; we use in our asymptotic theory, while our numerical results show that works well in practice. Every bucket stores at most elements. If , then we say that is full.

Every partition has a corresponding threshold that is exponentially decreasing with the partition index as . For example, the buckets in the first partition will only store elements that have marginal value at least . Every element arriving on the stream is assigned to the first non-full bucket for which the marginal value is at least . If there is no such bucket, the element will not be stored. Hence, the buckets are disjoint sets that in the end (after one pass over the data) can have a smaller number of elements than specified by their corresponding cardinality constraints, and some of them might even be empty. The set returned by STAR-T is the union of all the buckets.

In the second stage, STAR-T-Greedy receives as input the set constructed in the streaming stage, a set that we think of as removed elements, and the cardinality constraint . The algorithm then returns a set , of size at most , that is obtained by running the simple greedy algorithm Greedy on the set . Note that STAR-T-Greedy can be invoked for different sets .

1:Set , , ,
3:for each element in the stream do
4:     for  do loop over partitions
5:         for  do loop over buckets
6:              if  then
8:                  break: proceed to the next element in the stream                             
Algorithm 1 STreAming Robust - Thresholding submodular algorithm (STAR-T)
1:Set , query set and
Algorithm 2 STAR-T- Greedy

4 Theoretical Bounds

In this section we discuss our main theoretical results. We initially assume that the value is known; later, in Section 4.1, we remove this assumption. The more detailed versions of our proofs are given in the supplementary material. We begin by stating the main result.

Theorem 4.1

Let be a normalized monotone submodular function defined over the ground set . Given a cardinality constraint and parameter , for a setting of parameters and

STAR-T performs a single pass over the data set and constructs a set of size at most elements.

For such a set and any set such that , STAR-T-Greedy yields a set of size at most with

for . Therefore, as , the value of approaches .

Proof sketch.

We first consider the case when there is a partition in such that at least half of its buckets are full. We show that there is at least one full bucket such that is only a constant factor smaller than , as long as the threshold is set close to . We make this statement precise in the following lemma:

Lemma 4.2

If there exists a partition in such that at least half of its buckets are full, then for the set produced by STAR-T-Greedy we have


To prove this lemma, we first observe that from the properties of Greedy it follows that

Now it remains to show that is close to . We observe that for any full bucket , we have , so its objective value is at least (every element added to this bucket increases its objective value by at least ). On average, is relatively small, and hence we can show that there exists some full bucket such that is close to .

Next, we consider the other case, i.e., when for every partition, more than half of its buckets are not full after the execution of STAR-T. For every partition , we let denote a bucket that is not fully populated and for which is minimized over all the buckets of that partition. Then, we look at such a bucket in the last partition: .

We provide two lemmas that depend on . If is set to be small compared to :

  • Lemma 4.3 shows that if is close to , then our solution is within a constant factor of ;

  • Lemma 4.4 shows that if is small compared to , then our solution is again within a constant factor of .

Lemma 4.3

If there does not exist a partition of such that at least half of its buckets are full, then for the set produced by STAR-T-Greedy we have

where is a not-fully-populated bucket in the last partition that minimizes and .

Using standard properties of submodular functions and the Greedy algorithm we can show that

The complete proof of this result can be found in Lemma B.2, in the supplementary material.

Lemma 4.4

If there does not exist a partition of such that at least half of its buckets are full, then for the set produced by STAR-T-Greedy,

where is any not-fully-populated bucket in the last partition.

To prove this lemma, we look at two sets and , where contains all the elements from that are placed in the buckets that precede bucket in , and set . By monotonicity and submodularity of , we bound by:

To bound the sum on the right hand side we use that for every we have , which holds due to the fact that is a bucket in the last partition and is not fully populated.

We conclude the proof by showing that .

Equipped with the above results, we proceed to prove our main result.

Proof of Theorem 4.1. First, we prove the bound on the size of :


By setting we obtain .

Next, we show the approximation guarantee. We first define , , and . Lemma 4.3 and 4.4 provide two bounds on , one increasing and one decreasing in . By balancing out the two bounds, we derive


with equality for .

Next, as , we can observe that Eq. (4) is decreasing, while the bound on given by Lemma 4.2 is increasing in for . Hence, by balancing out the two inequalities, we obtain our final bound


For we have , and hence, by substituting and in Eq. (5), we prove our main result:

4.1 Algorithm without access to

Algorithm STAR-T requires in its input a parameter which is a function of an unknown value . To deal with this shortcoming, we show how to extend the idea of [8] of maintaining multiple parallel instances of our algorithm in order to approximate . For a given constant , this approach increases the space by a factor of and provides a -approximation compared to the value obtained in Theorem 4.1. More precisely, we prove the following theorem.

Theorem 4.5

For any given constant there exists a parallel variant of STAR-T that makes one pass over the stream and outputs a collection of sets of total size with the following property: There exists a set such that applying STAR-T-Greedy on yields a set of size at most with

The proof of this theorem, along with a description of the corresponding algorithm, is provided in Appendix E.

5 Experiments

In this section, we numerically validate the claims outlined in the previous section. Namely, we test the robustness and compare the performance of our algorithm against the Sieve-Streaming algorithm that knows in advance which elements will be removed. We demonstrate improved or matching performance in two different data summarization applications: (i) the dominating set problem, and (ii) personalized movie recommendation. We illustrate how a single robust summary can be used to regenerate recommendations corresponding to multiple different removals.

5.1 Dominating Set

In the dominating set problem, given a graph , where represents the set of nodes and stands for edges, the objective function is given by , where denotes the neighborhood of (all nodes adjacent to any node of ). This objective function is monotone and submodular.

We consider two datasets: (i) ego-Twitter [19], consisting of 973 social circles from Twitter, which form a directed graph with nodes and edges; (ii) Amazon product co-purchasing network [20]: a directed graph with nodes and edges.

Given the dominating set objective function, we run STAR-T to obtain the robust summary . Then we compare the performance of STAR-T-Greedy, which runs on , against the performance of Sieve-Streaming, which we allow to know in advance which elements will be removed. We also compare against a method that chooses the same number of elements as STAR-T, but does so uniformly at random from the set of all elements that will not be removed (); we refer to it as Random. Finally, we also demonstrate the peformance of STAR-T-Sieve, a variant of our algorithm that uses the same robust summary , but instead of running Greedy in the second stage, it runs Sieve-Streaming on .

Figures 2(a,c) show the objective value after the random removal of elements from the set , for different values of . Note that is sampled as a subset of the summary of our algorithm, which hurts the performance of our algorithm more than the baselines. The reported numbers are averaged over iterations. STAR-T-Greedy, STAR-T-Sieve and Sieve-Streaming perform comparably (STAR-T-Greedy slightly outperforms the other two), while Random is significantly worse.

In Figures 2(b,d) we plot the objective value for different values of after the removal of elements from the set , chosen greedily (i.e., by iteratively removing the element that reduces the objective value the most). Again, STAR-T-Greedy, STAR-T-Sieve and Sieve-Streaming perform comparably, but this time Sieve-Streaming slightly outperforms the other two for some values of . We observe that even when we remove more than elements from , the performance of our algorithm is still comparable to the performance of Sieve-Streaming (which knows in advance which elements will be removed). We provide additional results in the supplementary material.

Figure 2: Numerical comparisons of the algorithms STAR-T-Greedy, STAR-T-Sieve and Sieve-Streaming.

5.2 Interactive Personalized Movie Recommendation

The next application we consider is personalized movie recommendation. We use the MovieLens 1M database [21], which contains ratings for movies by

users. Based on these ratings, we obtain feature vectors for each movie and each user by using standard low-rank matrix completion techniques

[22]; we choose the number of features to be .

For a user , we use the following monotone submodular function to recommend a set of movies :

The first term aggregates the predicted scores of the chosen movies for the user (here and are non-normalized feature vectors of user and movie , respectively). The second term corresponds to a facility-location objective that measures how well the set covers the set of all movies  [4]. Finally, is a user-dependent parameter that specifies the importance of global movie coverage versus high scores of individual movies.

Here, the robust setting arises naturally since we do not have complete information about the user: when shown a collection of top movies, it will likely turn out that they have watched (but not rated) many of them, rendering these recommendations moot. In such an interactive setting, the user may also require (or exclude) movies of a specific genre, or similar to some favorite movie.

We compare the performance of our algorithms STAR-T-Greedy and STAR-T-Sieve in such scenarios against two baselines: Greedy and Sieve-Streaming (both being run on the set , i.e., knowing the removed elements in advance). Note that in this case we are able to afford running Greedy, which may be infeasible when working with larger datasets. Below we discuss two concrete practical scenarios featured in our experiments.

Movies by genre.

After we have built our summary , the user decides to watch a drama today; we retrieve only movies of this genre from . This corresponds to removing of the universe . In Figure 2(f) we report the quality of our output compared to the baselines (for user ID and ) for different values of . The performance of STAR-T-Greedy is within several percent of the performance of Greedy (which we can consider as a tractable optimum), and the two sieve-based methods STAR-T-Sieve and Sieve-Streaming display similar objective values.

Already-seen movies.

We randomly sample a set of movies already watched by the user ( out of all movies). To obtain a realistic subset, each movie is sampled proportionally to its popularity (number of ratings). Figure 2(e) shows the performance of our algorithm faced with the removal of (user ID , ) for a range of settings of . Again, our algorithm is able to almost match the objective values of Greedy (which is aware of in advance).

Recall that we are able to use the same precomputed summary for different removed sets . This summary was built for parameter , which theoretically allows for up to removals. However, despite having in the above scenarios, our performance remains robust; this indicates that our method is more resilient in practice than what the proved bound alone would guarantee.

6 Conclusion

We have presented a new robust submodular streaming algorithm STAR-T based on a novel partitioning structure and an exponentially decreasing thresholding rule. It makes one pass over the data and retains a set of size . We have further shown that after the removal of any elements, a simple greedy algorithm that runs on the obtained set achieves a constant-factor approximation guarantee for robust submodular function maximization. In addition, we have presented two numerical studies where our method compares favorably against the Sieve-Streaming algorithm that knows in advance which elements will be removed.


IB and VC’s work was supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement number 725594), in part by the Swiss National Science Foundation (SNF), project 407540_167319/1, in part by the NCCR MARVEL, funded by the Swiss National Science Foundation, in part by Hasler Foundation Switzerland under grant agreement number 16066 and in part by Office of Naval Research (ONR) under grant agreement number N00014-16-R-BA01. JT’s work was supported by ERC Starting Grant 335288-OptApprox.


  • [1] S. Tschiatschek, R. K. Iyer, H. Wei, and J. A. Bilmes, “Learning mixtures of submodular functions for image collection summarization,” in Advances in neural information processing systems, 2014, pp. 1413–1421.
  • [2] H. Lin and J. Bilmes, “A class of submodular functions for document summarization,” in Assoc. for Comp. Ling.: Human Language Technologies-Volume 1, 2011.
  • [3] D. Kempe, J. Kleinberg, and É. Tardos, “Maximizing the spread of influence through a social network,” in Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), 2003.
  • [4] E. Lindgren, S. Wu, and A. G. Dimakis, “Leveraging sparsity for efficient submodular data summarization,” in Advances in Neural Information Processing Systems, 2016, pp. 3414–3422.
  • [5] A. Krause and R. G. Gomes, “Budgeted nonparametric learning from data streams,” in ICML, 2010, pp. 391–398.
  • [6] K. El-Arini and C. Guestrin, “Beyond keyword search: discovering relevant scientific literature,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining.   ACM, 2011, pp. 439–447.
  • [7] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations for maximizing submodular set functions—i,” Mathematical Programming, vol. 14, no. 1, pp. 265–294, 1978.
  • [8] A. Badanidiyuru, B. Mirzasoleiman, A. Karbasi, and A. Krause, “Streaming submodular maximization: Massive data summarization on the fly,” in Proceedings of the 20th ACM SIGKDD.   ACM, 2014, pp. 671–680.
  • [9] A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta, “Robust submodular observation selection,” Journal of Machine Learning Research, vol. 9, no. Dec, pp. 2761–2801, 2008.
  • [10] J. B. Orlin, A. S. Schulz, and R. Udwani, “Robust monotone submodular function maximization,” in Int. Conf. on Integer Programming and Combinatorial Opt. (IPCO).   Springer, 2016.
  • [11] I. Bogunovic, S. Mitrović, J. Scarlett, and V. Cevher, “Robust submodular maximization: A non-uniform partitioning approach,” in Int. Conf. Mach. Learn. (ICML), 2017.
  • [12] R. Kumar, B. Moseley, S. Vassilvitskii, and A. Vattani, “Fast greedy algorithms in MapReduce and streaming,” ACM Transactions on Parallel Computing, vol. 2, no. 3, p. 14, 2015.
  • [13] A. Norouzi-Fard, A. Bazzi, I. Bogunovic, M. El Halabi, Y.-P. Hsieh, and V. Cevher, “An efficient streaming algorithm for the submodular cover problem,” in Adv. Neur. Inf. Proc. Sys. (NIPS), 2016.
  • [14] W. Chen, T. Lin, Z. Tan, M. Zhao, and X. Zhou, “Robust influence maximization,” in Proceedings of the ACM SIGKDD, 2016, p. 795.
  • [15] M. Staib and S. Jegelka, “Robust budget allocation via continuous submodular functions,” in Int. Conf. Mach. Learn. (ICML), 2017.
  • [16]

    D. Golovin and A. Krause, “Adaptive submodularity: Theory and applications in active learning and stochastic optimization,”

    Journal of Artificial Intelligence Research

    , vol. 42, 2011.
  • [17] A. Guillory and J. Bilmes, “Interactive submodular set cover,” arXiv preprint arXiv:1002.3345, 2010.
  • [18] B. Mirzasoleiman, A. Karbasi, and A. Krause, “Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”,” in International Conference on Machine Learning, 2017, pp. 2449–2458.
  • [19] J. Mcauley and J. Leskovec, “Discovering social circles in ego networks,” ACM Trans. Knowl. Discov. Data, 2014.
  • [20] J. Yang and J. Leskovec, “Defining and evaluating network communities based on ground-truth,” Knowledge and Information Systems, vol. 42, no. 1, pp. 181–213, 2015.
  • [21] F. M. Harper and J. A. Konstan, “The MovieLens datasets: History and context,” ACM Transactions on Interactive Intelligent Systems (TiiS), vol. 5, no. 4, p. 19, 2016.
  • [22]

    O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, “Missing value estimation methods for DNA microarrays,”

    Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001.

Appendix A Detailed Proof of Lemma 4.2

See 4.2

Proof. Let be a partition such that half of its buckets are full. Let be a full bucket that minimizes . In STAR-T, every partition contains buckets. Hence, the number of full buckets in partition is at least . That further implies


Taking into account that is a full bucket, we conclude


From the property of our Algorithm (line 6) every element added to increased the utility of this bucket by at least . Combining this with the fact that is full, we conclude that the gain of every element in this bucket is at least . Therefore, from Eq. (7) it follows:


Taking into account that this further reduces to




where Eq. (10) follows from , Eq. (11) follows from the fact that , and Eq. (12) follows from Eq. (9).

Appendix B Detailed Proof of Lemma 4.3

We start by studying some properties of that we use in the proof of Lemma 4.3.

Lemma B.1

Let be a bucket in partition , and let denote the elements that are removed from this bucket. Given a bucket from the previous partition such that (i.e. is not fully populated), the loss in the bucket due to the removals is at most

Proof. First, we can bound as follows


Consider a single element . There are two possible cases: , and . In the first case, . In the second one, as we conclude , as otherwise the streaming algorithm would place in . These observations together with (13) imply:

Lemma B.2

For every partition , let denote a bucket such that (i.e. no partition is fully populated), and let denote the elements that are removed from . The loss in the bucket due to the removals, given all the remaining elements in the previous buckets, is at most

Proof. We proceed by induction. More precisely, we show that for any the following holds


Once we show that (14) holds, the lemma will follow immediately by setting .

Base case .

Since is not fully populated and the maximum number of elements in the partition is , it follows that both and are empty. Then the term on the left hand side of (14) for becomes . As we can apply Lemma B.1 to obtain

Inductive step .

Now we show that (14) holds for , assuming that it holds for . First, due to submodularity we have

and, hence, we can write


Due to monotonicity, the first term can be further bounded by


and for the third term we have


where to obtain the identity we used that .

By substituting the obtained bounds (16) and (17) in (15) we obtain:


where the second inequality follows by submodularity.

Next, Lemma B.1 can be used (as ) to bound the first term in (18):


To conclude the proof, we use the inductive hypothesis that (14) holds for , which together with (19) implies

as desired.

See 4.3

Proof. Let denote a bucket in partition which is not fully populated (), and for which , where , is of minimum cardinality. Such bucket exists in every partition due to the assumption of the lemma that more than a half of the buckets are not fully populated.



where Eq. (20) follows from Lemma D.1 by setting , and . As we consider buckets that are not fully populated, Lemma B.2 is used to obtain Eq. (21). Next, we bound each term in Eq. (21) independently.

From Algorithm 1 we have that partition consists of buckets. By the assumption of the lemma, more than half of those are not fully populated. Recall that is defined to be a bucket of partition which is not fully populated and which minimizes . Let be the subset of that intersects buckets of partition . Then, can be bounded as follows:

Hence, the sum on the left hand side of Eq. (21) can be bounded as

Putting the last inequality together with Eq. (21) we obtain

Observe also that

which implies



as desired.

Appendix C Detailed Proof of Lemma 4.4

See 4.4

Proof. Let denote a bucket in the last partition which is not fully populated. Such bucket exists due to the assumption of the lemma that more than a half of the buckets are not fully populated.

Let and be two sets such that contains all the elements from that are placed in the buckets that precede bucket in , and let . In that case, for every we have


due to the fact that is the bucket in the last partition and is not fully populated.

We proceed to bound :


where Eq. (24) follows from and submodularity, Eq (25) and Eq (26) follow from monotonicity and submodularity, respectively. Eq. (27) follows from Eq. (23), and Eq. (28) follows from .

Finally, we have: