Clustering Semi-Random Mixtures of Gaussians

11/23/2017
by   Pranjal Awasthi, et al.
0

Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model, a semi-random adversary is allowed to make arbitrary "monotone" or helpful changes to the data generated from the Gaussian mixture model. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching information-theoretic lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2020

K-Means and Gaussian Mixture Modeling with a Separation Constraint

We consider the problem of clustering with K-means and Gaussian mixture ...
research
12/19/2017

Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds

Clustering mixtures of Gaussian distributions is a fundamental and chall...
research
12/08/2020

Algorithms for finding k in k-means

k-means Clustering requires as input the exact value of k, the number of...
research
12/29/2016

Quantum Clustering and Gaussian Mixtures

The mixture of Gaussian distributions, a soft version of k-means , is co...
research
09/01/2019

Gaussian mixture model decomposition of multivariate signals

We propose a greedy variational method for decomposing a non-negative mu...
research
10/03/2021

Information Elicitation Meets Clustering

In the setting where we want to aggregate people's subjective evaluation...
research
08/21/2015

Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures

Coresets are efficient representations of data sets such that models tra...

Please sign up or login with your details

Forgot password? Click here to reset