Leveraging Sparsity for Efficient Submodular Data Summarization

03/08/2017
by   Erik M. Lindgren, et al.
0

The facility location problem is widely used for summarizing large datasets and has additional applications in sensor placement, image retrieval, and clustering. One difficulty of this problem is that submodular optimization algorithms require the calculation of pairwise benefits for all items in the dataset. This is infeasible for large problems, so recent work proposed to only calculate nearest neighbor benefits. One limitation is that several strong assumptions were invoked to obtain provable approximation guarantees. In this paper we establish that these extra assumptions are not necessary---solving the sparsified problem will be almost optimal under the standard assumptions of the problem. We then analyze a different method of sparsification that is a better model for methods such as Locality Sensitive Hashing to accelerate the nearest neighbor computations and extend the use of the problem to a broader family of similarities. We validate our approach by demonstrating that it rapidly generates interpretable summaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2014

Rates of Convergence for Nearest Neighbor Classification

Nearest neighbor methods are a popular class of nonparametric estimators...
research
11/16/2014

Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval

We present a simple but powerful reinterpretation of kernelized locality...
research
06/27/2012

On the Difficulty of Nearest Neighbor Search

Fast approximate nearest neighbor (NN) search in large databases is beco...
research
07/22/2013

Solving Traveling Salesman Problem by Marker Method

In this paper we use marker method and propose a new mutation operator t...
research
12/11/2021

SLOSH: Set LOcality Sensitive Hashing via Sliced-Wasserstein Embeddings

Learning from set-structured data is an essential problem with many appl...
research
04/09/2023

A Note on "Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms"

Data valuation is a growing research field that studies the influence of...
research
07/17/2018

Supermodular Locality Sensitive Hashes

In this work, we show deep connections between Locality Sensitive Hashab...

Please sign up or login with your details

Forgot password? Click here to reset