Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams

08/06/2018
by   Ashkan Norouzi-Fard, et al.
0

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that SALSA significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/09/2020

Streaming Submodular Maximization with Fairness Constraints

We study the problem of extracting a small subset of representative item...
11/03/2014

Distributed Submodular Maximization

Many large-scale machine learning problems--clustering, non-parametric l...
11/14/2018

Submodular Optimization Over Streams with Inhomogeneous Decays

Cardinality constrained submodular function maximization, which aims to ...
08/16/2022

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Maximizing a submodular function is a fundamental task in machine learni...
06/01/2016

Stream Clipper: Scalable Submodular Maximization on Stream

Applying submodular maximization in the streaming setting is nontrivial ...
04/27/2020

Robust Algorithms under Adversarial Injections

In this paper, we study streaming and online algorithms in the context o...
01/07/2019

Approximate-Closed-Itemset Mining for Streaming Data Under Resource Constraint

Here, we present a novel algorithm for frequent itemset mining for strea...