Very Fast Streaming Submodular Function Maximization

10/20/2020
by   Sebastian Buschjäger, et al.
6

Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against 6 other methods on 8 different datasets with and without concept drift. We show that our algorithm outperforms current state-of-the-art algorithms and, at the same time, uses fewer resources. Last, we highlight a real-world use-case of our algorithm for data summarization in gamma-ray astronomy. We make our code publicly available at https://github.com/sbuschjaeger/SubmodularStreamingMaximization.

READ FULL TEXT
research
06/01/2016

Stream Clipper: Scalable Submodular Maximization on Stream

Applying submodular maximization in the streaming setting is nontrivial ...
research
02/10/2021

Budget-Smoothed Analysis for Submodular Maximization

The greedy algorithm for submodular function maximization subject to car...
research
07/14/2019

The FAST Algorithm for Submodular Maximization

In this paper we describe a new algorithm called Fast Adaptive Sequencin...
research
02/20/2018

Do Less, Get More: Streaming Submodular Maximization with Subsampling

In this paper, we develop the first one-pass streaming algorithm for sub...
research
03/20/2023

High Probability Bounds for Stochastic Continuous Submodular Maximization

We consider maximization of stochastic monotone continuous submodular fu...
research
06/16/2023

MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm

Consistent hashing is used in distributed systems and networking applica...
research
01/10/2018

Worst-case Optimal Submodular Extensions for Marginal Estimation

Submodular extensions of an energy function can be used to efficiently c...

Please sign up or login with your details

Forgot password? Click here to reset