Stream Clipper: Scalable Submodular Maximization on Stream

by   Tianyi Zhou, et al.
University of Washington

Applying submodular maximization in the streaming setting is nontrivial because the commonly used greedy algorithm exceeds the fixed memory and computational limits typically needed during stream processing. We introduce a new algorithm, called stream clipper, that uses two thresholds to select elements either into a solution set S or an extra buffer B. The output is achieved by a greedy algorithm that starts from S and then, if needed, greedily adds elements from B. Swapping elements out of S may also be triggered lazily for further improvements, and elements may also be removed from B (and corresponding thresholds adjusted) in order to keep memory use bounded by a constant. Although the worst-case approximation factor does not outperform the previous worst-case of 1/2, stream clipper can perform better than 1/2 depending on the order of the elements in the stream. We develop the idea of an "order complexity" to characterize orders on which an approximation factor of 1-α can be achieved. In news and video summarization tasks, stream clipper significantly outperforms other streaming methods. It shows similar performance to the greedy algorithm but with less computation and memory costs.



There are no comments yet.


page 25

page 27


Streaming Robust Submodular Maximization: A Partitioned Thresholding Approach

We study the classical problem of maximizing a monotone submodular funct...

Very Fast Streaming Submodular Function Maximization

Data summarization has become a valuable tool in understanding even tera...

Scaling Submodular Maximization via Pruned Submodularity Graphs

We propose a new random pruning method (called "submodular sparsificatio...

Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

Many tasks in machine learning and data mining, such as data diversifica...

Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for sub...

Streaming Weak Submodularity: Interpreting Neural Networks on the Fly

In many machine learning applications, it is important to explain the pr...

Robust Algorithms under Adversarial Injections

In this paper, we study streaming and online algorithms in the context o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Success in today’s machine learning and artificial intelligence algorithms relies largely on big data. Often, however, there may exist a small data subset that can act as a surrogate for the whole. Thus, various summarization methods have been designed to select such representative subsets and reduce redundancy. The problem is usually formulated as maximizing a score function

that assigns importance scores to subsets of an underlying ground set of all elements. Submodular functions are a useful class of functions for this purpose: a function is submodular (Fujishige, 2005) if for any subset and ,


Since the above diminishing returns property naturally captures the redundancy among elements in terms of their importance to a summary, submodular functions have been commonly used as objectives in summarization and machine learning applications. The importance of ’s contribution to is , called the “marginal gain” of conditioned on .

The objective can be chosen from a rich class of submodular functions, e.g., facility location, saturated coverage, feature based, entropy and . We focus on the most commonly used form: normalized and monotone non-decreasing submodular functions, i.e., and . In order for a summary to have a limited size, a cardinality constraint is often applied, as we focus on in this paper. We also address, however, knapsack and matroid constraints in (Zhou and Bilmes, 2018). Under a cardinality constraint, the problem becomes


Submodular maximization is usually NP-hard. However, (2) can be solved near-optimally by a greedy algorithm with approximation factor (Nemhauser et al., 1978). Starting from , greedy algorithm selects the element with the largest marginal gain into , i.e., , until . To accelerate the greedy algorithm without an objective value loss, the lazy greedy approach (Minoux, 1978; Leskovec et al., 2007) updates only the top element of a priority queue of marginal gains for all elements in in each step. Recent approximate greedy algorithms (Iyer et al., 2013; Wei et al., 2014a; Mirzasoleiman et al., 2015) develop piece-wise, multi-stage, or random sampling strategies to tradeoff approximate optimality and speed.

Figure 1. Left: Naïve stream clipper in Algorithm 1, is the stream, is the sequence of the selected elements; Right: Stream clipper with swapping and buffer cleaning in Algorithm 2, swapping replaces with and increases both and , buffer cleaning removes elements by increasing .

In various applications such as news digesting, video summarization (Mirzasoleiman et al., 2017a), music recommending and photo sharing, data is fed into a system as a stream and under a particular order. At any time point , the user can request a summary of the elements he/she has seen so far. The greedy algorithm and its variants are not appropriate to the streaming setting both for memory and computational reasons, i.e., they require storing all elements in advance, and computing their marginal gains each step. In this paper, we study how to solve (2) with for any in the streaming setting in one pass using a memory of size only , where is the number of buffered elements and is the number of elements in the solution set.

1.1. Related Work

Various strategies have been proposed in previous work to solve (2) in the streaming setting. A thresholding algorithm in (Badanidiyuru et al., 2014) adds element to a summary if its marginal gain exceeds a threshold , where and is the global maxima. One function evaluation is required per step for computing . However, in

is not known in advance for a stream so the proposed sieve-streaming algorithm starts by running multiple instances of the thresholding algorithm with different estimates of

, and dynamically removes the instances whose estimates of lie outside the interval updated by the maximal singleton gain. At the end, the instance achieving the largest is used for the solution . It has a guarantee of with memory. A sliding window method based on thresholding (Chen et al., 2016) has also been proposed that emphasizes recent data.

Swapping between new elements and the ones in is a natural yet more computationally expensive strategy (Buchbinder et al., 2015; Chekuri et al., 2015; Gomes and Krause, 2010). The algorithm initializes with the first elements from the stream, and keeps replacing a new element and once (Buchbinder et al., 2015) or (Chekuri et al., 2015), where , and are nonnegative constants, and denotes the historical solution set right before adding to it. Both cases have guarantee (when for the former) with memory size . The latter requires less computation, i.e, one function evaluation per element, comparing to evaluations required by the former.

A mini-batch based strategy splits the whole stream evenly into segments, and sequentially adds to the element with the largest marginal gain in each segment. It was introduced via the submodular secretary problem and its extensions (Bateni et al., 2013). This algorithm has an approximation bound of in expectation with memory size , if the data arrives in a uniformly at random order. This method requires only one function evaluation per element, but it needs to know the length of the stream in advance, impossible when the stream is unboundedly large and a summary can be requested at any time.

A hardness result is given in Theorem 1.6 of (Buchbinder et al., 2015): for solving (2) in the online setting, there is no deterministic algorithm -competitive for any constant . In Lemma 4.7 of (Buchbinder et al., 2015) (Lemma 4.11 in its arXiv version), the approximation factor in the worst case cannot exceed unless and all the elements up to a summary request is stored in the memory. Note the online setting in (Buchbinder et al., 2015)111The online setting in (Buchbinder et al., 2015) is, and we quote: “The elements of N arrive one by one in an online fashion. Upon arrival, the online algorithm must decide whether to accept each revealed element into its solution and this decision is irrevocable.” is slightly different from our streaming setting in that it does not allow the buffering of unselected elements. However, it is trivial to generalize the -hardness to algorithms with buffer size . In particular, we consider the submodular function used in the proof of Lemma 4.7 in (Buchbinder et al., 2015), and use their notations for and : the hardness stays unless the algorithm buffers at least one , but since the algorithm cannot distinguish and until seeing the last element , it needs to buffer at least elements to ensure that one is stored in the buffer.

Different settings for streaming submodular maximization have also been studied recently. A robust streaming algorithm (Mirzasoleiman et al., 2017b) has been studied for when the data provider has the right to delete at most elements due to privacy concerns. Given any single-pass streaming algorithm with an -approximation guarantee, it runs a cascading chain of instances of such an algorithm with non-overlapping solutions to ensure that only one solution is affected by a deletion. Its solution still satisfies a -approximation guarantee when deletions are allowed. Another popularly studied setting is submodular maximization with sliding windows (Epasto et al., 2017), which aims to maintain a solution that takes only the last items into account.

In the present paper, we mainly focus on the classical streaming setting where deletion or sliding windows is not considered. Our method, however, can be applied as a streaming algorithm subroutine in the deletion-robust setting of (Mirzasoleiman et al., 2017b).

1.2. Our Approach

In practice, the thresholding algorithm must try a large number of thresholds (associated with different estimates of ) to obtain a sufficiently good solution, because the solution set is sensitive to tiny changes in threshold . This results in a high memory load. Though swapping and mini-batch strategies ask for a smaller memory size , the former requires function evaluations per step, while the latter needs to know in advance and requires uniformly at random ordered elements, which cannot be justified in a streaming setting. Although the worst-case approximation factors of the three algorithms are , and respectively, they perform much poorer in practice than the offline greedy algorithm, which has the worst-case approximation factor but usually performs much better than .

The main contributions of this paper is a novel streaming algorithm (that we call “stream clipper”) that can achieve similar empirical performance to the offline greedy algorithm, and we analyze when this is the case. It is given in Algorithm 1 and illustrated in the left plot of Figure 1. It uses two thresholds and to process each element : it adds to the solution set if ; rejects if ; otherwise (i.e., ) places in a buffer . The final solution is generated by a greedy algorithm starting from the obtained and adds more elements from to until reaches the budget size . Since the elements with marginal gains slightly less than are saved in and given a second chance to be selected into , the two-threshold scheme mitigates the instability of a single thresholding method without requiring the testing of a large number of different thresholds simultaneously.

According to the hardness analysis in (Buchbinder et al., 2015), the worst-case approximation factor of stream clipper cannot exceed for memory size . However, we explicitly show that in some cases when thresholds and fulfill certain data dependent conditions, its approximation factor lies in . In addition, given , and a data stream to process, we show simple conditions to justify when stream clipper can guarantee an approximation factor for any .

An advanced version of stream clipper is given in Algorithm 2 with illustration in the right plot in Figure 1. It allows an element in buffer to replace some element in , if such swapping improves the objective . This avoids extra computation spent on swapping for every new element . In addition, the advanced version adapts thresholds to remove elements from the buffer once its size exceeds a user defined limit . This guarantees memory efficiency even for a poor initialization of the thresholds. In Section 3, experiments on news and video summarization show that stream clipper significantly outperforms other streaming algorithms consistently (Figure 2-5, Figure 10). In most experiments, it achieves as large as the offline greedy algorithm, and produces a summary of similar quality, but costs much less memory and computation due to its streaming setting.

2. Stream Clipper

In the following, we first introduce a naïve stream clipper and then later its advanced version with swapping, threshold adaptation, and buffer cleaning procedures. Detailed analysis of the approximation bound in different cases (rather then the worst case) for the naïve version follows. We further show the analysis can be extended to the advanced version. In the following, we use the letters “A” for Algorithm and “L” for line. For example, A1.L2-5 refers to Lines 2-5 of Algorithm 1.

2.1. Naïve Stream Clipper

Input : , , ,
Output : 
Initialize : ,
1 for  to  do
2       if  and  then
4       else if  then
6       else
7            Reject
9while  do
10       ,
Algorithm 1 naïve_stream_clipper

We first give a naïve version of stream clipper in Algorithm 1. It selects element if and , and stores in (A1.L2-3), while rejects if (A1.L7). It places whose marginal gain is between and (A1.L4) into the buffer (A1.L5). Once a summary is requested, a greedy algorithm (A1.L8-10) adds more elements from to until .

In the following, we use and to represent and at the end of the iteration of the for-loop in Algorithm 1. Note and are the solution and buffer after passing elements but before running greedy procedure in A1.L8-10. We use to represent the final solution of Algorithm 1, use for the size of , and use to denote the selected element by A1.L3. In above algorithm, the thresholds and are fixed, so tuning them is important for getting a good solution. However, in the advanced version introduced below, they are updated adaptively with the incoming data stream, and thus more robust to the initialization values.

2.2. Advanced Stream Clipper

Input : , , ,
Output : 
Initialize : , , , ,
1 for  to  do
2       if  and  then
4       else if  then
5             if  then
7            else
10       else
11            Reject
12      while  do
15while  do
16       ,
Algorithm 2 stream_clipper

In practice, we develop two additional strategies to (1) achieve further improvement by occasional swapping between buffered element in and element in solution , and (2) keep the buffer size by removing unimportant elements from . The advanced version of stream clipper after applying these two strategies is given in Algorithm 2, where A2.L5-10 denotes the first strategy, and A2.L15-17 denotes the second strategy. Algorithm 2 is the same as Algorithm 1 if we ignore these steps.

The swapping procedure in A2.L5-10 is applied only to the new element whose marginal gain is between and . A2.L5 computes the objective for all the possible swappings between and element , and finds achieving the maximal objective . A2.L6 computes , the average of the swapping gain on the objective over all elements in . If , which means swapping brings positive improvements to the objective, the swapping is committed as in A2.L10. Comparing to previous swapping methods (Buchbinder et al., 2015) that computes for all new element , stream clipper only computes A2.L5 for such that . This improves the efficiency since computing requires function evaluations.

When the buffer size reaches the user defined limit , stream clipper increases by step size as shown in A2.L16. Since the lower threshold increases, elements in buffer whose marginal gain can be removed from (A2.L17). We repeat this buffer cleaning procedure until . Note the maximal value of after it increases is , because if .

In Algorithm 2, parameter is an estimate to . In practice, it can be initialized as and increased to according to solution set achieved in later steps. We initialize the “step size” as since it works well empirically. The two thresholds are initialized as shown in Algorithm 2. Note we can start with a sufficiently small is to guarantee and , and adaptively increase it later as in A 2.L16.

2.3. Approximation Bound

We study the approximation bound of Algorithm 1 in different cases rather than the worst case. Firstly, we assume is properly selected so . This guarantees elements are selected into by the greedy algorithm in A1.L8-10 and thus there are elements in the final output . A trivial choice of is .

Lemma 0 ().

If and , then before A1.L8.

When , all the elements whose marginal gain is less than will be stored in the buffer, and may lead to a large . Note the advanced version Algorithm 2 can start from , and adaptively increase it and clean the buffer when exceeds the limit . By following similar proof technique in (Nemhauser et al., 1978), we have the theorem below. Please refer to (Zhou and Bilmes, 2018) for its proof.

Theorem 2 ().

If submodular function is monotone non-decreasing and normalized, let , the following result holds for the final output of Algorithm 1.


The bound in (3) is a convex combination of and . It depends on , , , and : is known once a summary is requested; thresholds and are pre-defined parameters; is the optimum we need to compare to. However, is the number of elements from optimal set that have been rejected by A1.L7. It depends on that may not be known. In order to remove the dependency on , we take the minimum of the right hand side of (3) over all possible values of . We use to denote the right hand side of (3),


Since has a complex shape, we firstly study its first and second order derivatives.

Lemma 0 ().

The derivative and second order derivative of are



Proposition 0 ().

When , the minimum value of the bound given in (3) w.r.t. is either , or .

By using Proposition 4, we can derive the minimum value of in three different cases, which corresponds to three ranges of determined by , and . This leads to the following theorem.

Theorem 5 ().

Under the assumptions of Theorem 2, we have

Case 1: when ,


Case 2: when ,


Case 3: when ,


Remarks: In case 1, when , buffer and Algorithm 1 reduces to sieve-streaming (Badanidiyuru et al., 2014), so the bound is . In the following corollary, we further show in cases 2 & 3, better (i.e., ) bounds can be achieved when , since the greedy algorithm in the end of Algorithm 1 further takes advantage of elements from buffer .

Corollary 0 ().

Under the assumptions of Theorem 2, when (case-1), if and , . When (case-2&3), if , .

According to Corollary 6, although the approximation factor is possible to be for cases 2 & 3, the worst case bound is still . This obeys the hardness given in (Buchbinder et al., 2015), i.e., it is impossible to improve the worst-case bound over . However, the bound can be strictly better than on specific orders of the same set of elements . Given thresholds and , for a data stream with a specific order and an , we give the conditions to justify whether stream clipper can achieve an approximation factor .

In the following analysis, we use , a sequence of distinct integers from to , to denote the order of elements in the stream, i.e., . We use to represent the set of all orders. By analyzing the three cases in Theorem 5, we can locate and in specific ranges. In each range, we characterize the orders on which and buffer size is bounded by , i.e., .

Proposition 0 ().

1) For any , given and to use in stream clipper (Algorithm 1), define , where


if , for any order , we have .

2) For any , given and to use in stream clipper (Algorithm 1), define




for any order , we have .

The detailed proof is given in (Zhou and Bilmes, 2018). In the advanced version of stream clipper, we can adjust and to guarantee an nonempty . The conditions in (10) can provide some clues of how to adjust them based on the updated estimate of . According to Proposition 7, given , , and any , for the orders on which stream clipper achieves 1) and when , or 2) and for every when , we have with .

Remarks: We can easily extend the above analysis of Algorithm 1 to Algorithm 2 by replacing and in them with and (the thresholds after step ) respectively. Details are given in (Zhou and Bilmes, 2018).

Figure 2. Utility and time cost vs. length of data stream on the same data of different random orders. Stream clipper achieves similar utility as offline greedy, but has computational costs similar to other streaming algorithms (i.e., much less than the offline greedy).
Figure 3. Relative utility vs. . Different from sieve-streaming, stream clipper does not heavily rely on an accurate estimate to guarantee a large utility, because it can adaptively tune the thresholds on the fly.
Figure 4. Relative utility and time cost vs. memory size . It shows the advantage of stream clipper via the trade-off between memory usage and utility value. Stream clipper needs only to buffer sentences out of the in the whole stream to obtain almost the same utility as the offline greedy procedure, while its time cost is similar to the other streaming algorithm.

3. Experiments

In this section, on several news and video datasets, we compare summaries generated by stream clipper and other algorithms. We use the feature based submodular function (Wei et al., 2014b) as our objective, where is a set of features, and is a modular score ( is the affinity of element to feature ). This function typically achieves good performance on summarization tasks. Our baseline algorithms are the lazy greedy approach (Minoux, 1978) (which has identical output as greedy but is faster) and the “sieve-streaming” (Badanidiyuru et al., 2014) approach for streaming submodular maximization, which has low memory requirements as it takes one pass over the data. Note in summarization experiments, a difference of on utility usually leads to large gap on rouge-2 and F1-score.

3.1. Empirical Study on News

An empirical study is conducted on a ground set containing sentences from all NYT articles on a randomly selected date between 1996 and 2007, which are from the NYTs annotated corpus 1996-2007 ( Figure 2 shows how and time cost varies when we change . We set the budget size

of the summary to be the number of sentences in a human generated summary. The buffer size

of stream clipper is fixed to , while the number of trials in sieve-streaming is , leading to memory requirement of , which is much larger than of stream clipper. In order to test how performance varies with the order of stream, for each , we run same experiment on different random orders of the same data.

Figure 5. Statistics of relative utility , rouge-2 score and F1-score on daily news summarization results of days’ news from New York Times corpus between 1996-2007. Stream clipper achieves relative utility close to for most days. It has similar or more number of days than lazy greedy in the bins of high () rouge-2 and F1-score.
Figure 6. Length of data stream vs. time cost (exponential scale) on daily news summarization of days’ news from New York Times corpus between 1996-2007. The area of each circle is proportional to relative utility . The time cost of stream clipper grows slower than lazy greedy and saturates when . The time cost of sieve-streaming increases at first, but becomes small and does not change after . This is because the algorithm quickly fills with elements when and does not change anymore. However, this avoids to enroll new elements and leads to worse relative utility reflected by the smaller blue circles.

The utility and time cost of both streaming algorithms do not change too much when the order changes. The utility curve of stream clipper overlaps that of lazy greedy, while its time cost is much less and increases more slowly than that of lazy greedy. Sieve-streaming performs much worse than SS in terms of utility, and its time cost is only slightly less and even slightly decreases when increasing (this is because it quickly fills with elements and stops much earlier before seeing all elements).

Figure 4 shows how relative utility ( denotes the solution of the offline greedy algorithm) and time cost of the two streaming algorithms vary with memory size. Stream clipper quickly reaches a close to of greedy algorithm once exceeds , while sieve-streaming achieves much smaller which does not increase until . Note the time cost of stream clipper is larger than that of sieve-streaming when but dramatically decreases below it quickly. This is because the buffer cleaning procedure in A2.L15-17 needs to be frequently executed if is small (and is small). However, a slight increase in memory size can effectively reduce the time cost.

Figure 3 shows the robustness of the two streaming algorithms to parameter . In the wide range of , stream clipper keeps a relative utility, while sieve-streaming decreases dramatically around its peak value . Hence, sieve-streaming is more sensitive to and thus a delicate search of is necessary. This results in a high memory burden. By contrast, our approach adaptively adjusts two thresholds via swapping and buffer cleaning even when the estimate used to initialize them is inaccurate.

3.2. NYT News Summarization

In this section, we conduct summarization experiments on two news corpora, The New York Times annotated corpus 1996-2007 and the DUC 2001 corpus (

The first dataset includes all the articles published on The New York Times in days from 1996-2007. For each day, we collect the sentences in articles associated with human generated summaries as the ground set (with sizes varying from to ), and extract their TFIDF features to build . We concatenate the sentences from all human generated summaries in the same date as reference summary. We compare the machine generated summaries produced by different methods with the reference summary by ROUGE-2 (Lin, 2004) (recall on 2-grams) and ROUGE-2 F1-score (F1-measure based on recall and precision on 2-grams). We also compare their relative utility. As before, sieve-streaming holds a memory size of . Figure 5 shows the statistics over days.

Stream clipper keeps a relative utility for most days, while sieve-streaming dominates the region. The ROUGE-2 score of stream clipper is usually better than sieve-streaming, but slightly worse than lazy greedy. However, its F1-score is very close to that of lazy greedy, while sieve-streaming’s is much worse.

Figure 6 shows the number of collected sentences in each day and the corresponding time cost of each algorithm. The area of each circle is proportional to the relative utility. We use a log scale time axis for better visualization. Stream clipper is times faster than lazy greedy. Their time cost have similar increasing speed, because as the summary size increases, the greedy stage in stream clipper tends to dominate the computation. The time cost of sieve-streaming decreases when , but its relative utility also reduces fast. This is caused by the aforementioned early stopping.

Figure 7. Statistics of relative utility , rouge-2 score and F1-score on topic based news summarization results of document sets from DUC2001 training and test set, comparing to -word human generated summary.
Figure 8. Statistics of relative utility , rouge-2 score and F1-score on topic based news summarization results of document sets from DUC2001 training and test set, comparing to -word human generated summary.

3.3. DUC2001 News Summarization

Algorithm words Daycare Healthcare Pres92 Robert Gates
rouge2 F1 rouge2 F1 rouge2 F1 rouge2 F1
Lazy Greedy 400
Sieve-Streaming 400
Stream Clipper 400
Table 1. Performance of lazy greedy, sieve-streaming, and stream clipper on four topic summarization datasets from DUC 2001. For each topic, the machine generated summary is compared to four human generated ones having word count from 50 to 400.
Figure 9. F1-score of the summaries generated by greedy (yellow bar), sieve-streaming (cyan bar), stream clipper (magenta bar) and the first frames (green bar) comparing to reference summaries from users on videos from SumMe dataset. Each plot associates with a video. Stream clipper performs similar to or better than lazy greedy in most plots.

We also observe similar result on DUC 2001 corpus, which are composed of two datasets. The first one includes sets of documents, each is selected by a NIST assessor because the documents in a set are related to a same topic. The assessor also provides four human generated summary having word counts for each set. In Figure 7 and Figure 8, we report the statistics to rouge-2 and F1-score of summaries of the same size generated by different algorithms. The second dataset is composed of four document sets associated with four topics. We report the detailed results in Table 1. Both of them show stream clipper can achieve similar performance as offline greedy algorithm, whereas outperforms sieve-streaming.

3.4. Video Summarization

Figure 10. F1-score of the summaries generated by lazy greedy (yellow “”), sieve-streaming (cyan “”), stream clipper (magenta “”) and the first frames (green “”) comparing to reference summaries of different sizes between based on ground truth score (voting from users) on videos from SumMe. Each plot associates with a video. Stream clipper performs similar to or better than lazy greedy in most plots where sieve-streaming peforms poorly. In the plots where sieve-streaming outperforms others, its performance usually overlaps with that of the first frames. This is consistent with our observation in experiments that sieve-streaming usually saturates the solution by the first several frames and thus results in a trivial solution .

We apply lazy greedy, sieve-streaming, and stream clipper to videos from video summarization dataset SumMe (Gygli et al., 2014)222 Each video has frames as given in Table 2 (Zhou and Bilmes, 2018). We resize each frame to a image, and extract features from two standard image descriptors, i.e., a pyramid of HoG (pHoG) (Bosch et al., 2007) to delineate local and global shape, and GIST (Oliva and Torralba, 2001) to capture global scene. The pHoG features are achieved over a four-level pyramid using bins with angle of degrees. The GIST features are obtained by using blocks and orientation per scale. We concatenate them to form a

-dimensional feature vector for each frame to build

. Each algorithm selects of all frames as summary set, i.e., . Sieve-streaming uses a memory of frames, while stream clipper uses a much smaller memory of frames.

We compare the summaries generated by the three algorithms with the ones produced by the ground truth and users. Each user was asked to select a subset of frames as summary, and ground truth score of each frame is given by voting from all users. For each video, we compare each algorithm generated summary with the reference summary composed of the top frames with the largest ground truth scores for different , and the user summary from different users. In particular, we report F1-score for comparison to ground truth score generated summaries in Figure 10 (recall comparison is given in Figure 11 (Zhou and Bilmes, 2018)). We report F1-score for comparison to user summaries in Figure 9 (recall comparison is given in Figure 12 (Zhou and Bilmes, 2018)). In each plot for each video, we also report the average F1-score and average recall over all users.

Stream clipper approaches or outperforms lazy greedy and shows high F1-score on most videos, while the time cost is small according to Table 2. Although on a few videos sieve-streaming achieves the best F1-score, in most of these cases its generated summaries are trivially dominated by the first frames as shown in Figure 9-12 (Zhou and Bilmes, 2018). On these videos, neither lazy greedy nor stream clipper performs well, though they acheive high objective value in optimization. This indicates that the extracted features of the submodular function should be improved.

4. Conclusion

In this paper, we introduce stream clipper, a fast and memory-efficient streaming submodular maximization algorithm that can achieve similar performance as commonly used greedy algorithm. It uses two thresholds to either select important element into summary or a buffer. The final summary is generated by greedily selecting more elements from the buffer. Swapping and buffer-reduce procedures are triggered lazily for further improvement and bounding memory. Thresholds are adjusted adaptively to avoid search for the optimal thresholds.


  • (1)
  • Badanidiyuru et al. (2014) Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2014. Streaming Submodular Maximization: Massive Data Summarization on the Fly. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 671–680.
  • Bateni et al. (2013) Mohammadhossein Bateni, Mohammadtaghi Hajiaghayi, and Morteza Zadimoghaddam. 2013. Submodular Secretary Problem and Extensions. ACM Trans. Algorithms 9, 4 (2013), 32:1–32:23.
  • Bosch et al. (2007) Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Representing shape with a spatial pyramid kernel. In ACM International Conference on Image and Video Retrieval. 401–408.
  • Buchbinder et al. (2015) Niv Buchbinder, Moran Feldman, and Roy Schwartz. 2015. Online Submodular Maximization with Preemption. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms. 1202–1216.
  • Chekuri et al. (2015) Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. 2015. Streaming Algorithms for Submodular Function Maximization. In 42nd International Colloquium of Automata, Languages, and Programming (ICALP) Part I. 318–330.
  • Chen et al. (2016) Jiecao Chen, Huy L. Nguyen, and Qin Zhang. 2016. Submodular Maximization over Sliding Windows. arXiv (2016).
  • Epasto et al. (2017) Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. 2017. Submodular Optimization Over Sliding Windows. In International Conference on World Wide Web (WWW). 421–430.
  • Fujishige (2005) Satoru Fujishige. 2005. Submodular functions and optimization. Elsevier.
  • Gomes and Krause (2010) Ryan Gomes and Andreas Krause. 2010. Budgeted Nonparametric Learning from Data Streams. In Proc. International Conference on Machine Learning (ICML).
  • Gygli et al. (2014) Michael Gygli, Helmut Grabner, Hayko Riemenschneider, and Luc Van Gool. 2014. Creating Summaries from User Videos. In

    European Conference on Computer Vision (ECCV)

  • Iyer et al. (2013) Rishabh Iyer, Stefanie Jegelka, and Jeff A. Bilmes. 2013. Fast Semidifferential-based Submodular Function Optimization. In International Conference on Machine Learning (ICML).
  • Leskovec et al. (2007) Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Natalie Glance. 2007. Cost-effective Outbreak Detection in Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 420–429.
  • Lin (2004) Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. 74–81.
  • Minoux (1978) Michel Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques. Lecture Notes in Control and Information Sciences, Vol. 7. Chapter 27, 234–243.
  • Mirzasoleiman et al. (2015) Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrák, and Andreas Krause. 2015. Lazier Than Lazy Greedy. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 1812–1818.
  • Mirzasoleiman et al. (2017a) Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. 2017a. Streaming Non-monotone Submodular Maximization: Personalized Video Summarization on the Fly. arXiv (2017).
  • Mirzasoleiman et al. (2017b) Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. 2017b. Deletion-Robust Submodular Maximization: Data Summarization with “the Right to be Forgotten”. In International Conference on Machine Learning (ICML), Vol. 70. 2449–2458.
  • Nemhauser et al. (1978) G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming 14, 1 (1978), 265–294.
  • Oliva and Torralba (2001) Aude Oliva and Antonio Torralba. 2001. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. International Journal of Computer Vision 42, 3 (2001), 145–175.
  • Wei et al. (2014a) Kai Wei, Rishabh Iyer, and Jeff Bilmes. 2014a. Fast Multi-stage Submodular Maximization. In International Conference on Machine Learning (ICML).
  • Wei et al. (2014b) Kai Wei, Yuzong Liu, Katrin Kirchhoff, Chris D. Bartels, and Jeff A. Bilmes. 2014b. Submodular subset selection for large-scale speech training data. In IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP) 2014. 3311–3315.
  • Zhou and Bilmes (2018) Tianyi Zhou and Jeff Bilmes. 2018. Appendix for Stream Clipper. In Submitted.

Appendix A Proof of Theorem 2


We use to index the step of the greedy algorithm in A1.L8-10, while indexes variables after passing elements and before the greedy procedure in A1.L8-10. Note indexes the final step of the greedy procedure. We have


The first inequality uses monotonicity of , while the second one is due to submodularity.

The third inequalities follows from set theory along with the fact that is non-negative monotone non-decreasing. The fourth inequality is a result of applying rejection rule to the rejected elements in , and the max greedy selection rule in A1.L9. Rearranging (13) yields




then the rearranged inequality equals to


When and , this is exactly


Since in total elements are selected by the greedy algorithm, applying (17) from to yields


which is equivalent to


by applying the definition of . The last inequality is due to


which is due to selection rule used in A1.L2. For each selected element , in (20) is the solution at the beginning of the step. We simply use telescope sum representation of to achieve the equality in (20).

When , or , or both are zeros, (16) implies , which leads to


Note the right hand side of (19) is a convex combination of and , and thus is smaller than or equal to their maximum. If