Wasserstein Coresets for Lipschitz Costs

05/18/2018
by   Sebastian Claici, et al.
0

Sparsification is becoming more and more relevant with the proliferation of huge data sets. Coresets are a principled way to construct representative weighted subsets of a data set that have matching performance with the full data set for specific problems. However, coreset language neglects the nature of the underlying data distribution, which is often continuous. In this paper, we address this oversight by introducing a notion of measure coresets that generalizes coreset language to arbitrary probability measures. Our definition reveals a surprising connection to optimal transport theory which we leverage to design a coreset for problems with Lipschitz costs. We validate our construction on support vector machine (SVM) training, k-means clustering, k-median clustering, and linear regression and show that we are competitive with previous coreset constructions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2019

Clustering through the optimal transport barycenter problem

The problem of clustering a data set is formulated in terms of the Wasse...
research
04/19/2023

Accelerate Support Vector Clustering via Spectrum-Preserving Data Compression

Support vector clustering is an important clustering method. However, it...
research
02/15/2020

On Coresets for Support Vector Machines

We present an efficient coreset construction algorithm for large-scale S...
research
07/19/2019

Statistical data analysis in the Wasserstein space

This paper is concerned by statistical inference problems from a data se...
research
12/24/2020

Unsupervised neural adaptation model based on optimal transport for spoken language identification

Due to the mismatch of statistical distributions of acoustic speech betw...
research
05/28/2019

Evaluation of Machine Learning-based Anomaly Detection Algorithms on an Industrial Modbus/TCP Data Set

In the context of the Industrial Internet of Things, communication techn...

Please sign up or login with your details

Forgot password? Click here to reset