Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

by   Ashkan Norouzi-Fard, et al.

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that SALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.


page 1

page 2

page 3

page 4


Streaming Submodular Maximization with Fairness Constraints

We study the problem of extracting a small subset of representative item...

Distributed Submodular Maximization

Many large-scale machine learning problems--clustering, non-parametric l...

Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to ...

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Maximizing a submodular function is a fundamental task in machine learni...

Stream Clipper: Scalable Submodular Maximization on Stream

Applying submodular maximization in the streaming setting is nontrivial ...

Robust Algorithms under Adversarial Injections

In this paper, we study streaming and online algorithms in the context o...

Approximate-Closed-Itemset Mining for Streaming Data Under Resource Constraint

Here, we present a novel algorithm for frequent itemset mining for strea...