Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures

08/21/2015
by   Mario Lucic, et al.
0

Coresets are efficient representations of data sets such that models trained on the coreset are provably competitive with models trained on the original data set. As such, they have been successfully used to scale up clustering models such as K-Means and Gaussian mixture models to massive data sets. However, until now, the algorithms and the corresponding theory were usually specific to each clustering problem. We propose a single, practical algorithm to construct strong coresets for a large class of hard and soft clustering problems based on Bregman divergences. This class includes hard clustering with popular distortion measures such as the Squared Euclidean distance, the Mahalanobis distance, KL-divergence and Itakura-Saito distance. The corresponding soft clustering problems are directly related to popular mixture models due to a dual relationship between Bregman divergences and Exponential family distributions. Our theoretical results further imply a randomized polynomial-time approximation scheme for hard clustering. We demonstrate the practicality of the proposed algorithm in an empirical evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2017

Scalable and Distributed Clustering via Lightweight Coresets

Coresets are compact representations of data sets such that models train...
research
06/22/2018

A Novel ECOC Algorithm with Centroid Distance Based Soft Coding Scheme

In ECOC framework, the ternary coding strategy is widely deployed in cod...
research
11/23/2017

Clustering Semi-Random Mixtures of Gaussians

Gaussian mixture models (GMM) are the most widely used statistical model...
research
03/23/2012

k-MLE: A fast algorithm for learning statistical mixture models

We describe k-MLE, a fast and efficient local search algorithm for learn...
research
03/23/2017

Training Mixture Models at Scale via Coresets

How can we train a statistical mixture model on a massive data set? In t...
research
12/11/2018

Robust Bregman Clustering

Using a trimming approach, we investigate a k-means type method based on...
research
03/19/2017

Practical Coreset Constructions for Machine Learning

We investigate coresets - succinct, small summaries of large data sets -...

Please sign up or login with your details

Forgot password? Click here to reset