When Do Birds of a Feather Flock Together? K-Means, Proximity, and Conic Programming

10/16/2017
by   Xiaodong Li, et al.
0

Given a set of data, one central goal is to group them into clusters based on some notion of similarity between the individual objects. One of the most popular and widely-used approaches is K-means despite the computational hardness to find its global minimum. We study and compare the properties of different convex relaxations by relating them to corresponding proximity conditions, an idea originally introduced by Kumar and Kannan. Using conic duality theory, we present an improved proximity condition under which the Peng-Wei relaxation of K-means recovers the underlying clusters exactly. Our proximity condition improves upon Kumar and Kannan, and is comparable to that of Awashti and Sheffet where proximity conditions are established for projective K-means. In addition, we provide a necessary proximity condition for the exactness of the Peng-Wei relaxation. For the special case of equal cluster sizes, we establish a different and completely localized proximity condition under which the Amini-Levina relaxation yields exact clustering, thereby having addressed an open problem by Awasthi and Sheffet in the balanced case. Our framework is not only deterministic and model-free but also comes with a clear geometric meaning which allows for further analysis and generalization. Moreover, it can be conveniently applied to analyzing various data generative models such as the stochastic ball models and Gaussian mixture models. With this method, we improve the current minimum separation bound for the stochastic ball models and achieve the state-of-the-art results of learning Gaussian mixture models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2018

Hidden Integrality of SDP Relaxation for Sub-Gaussian Mixture Models

We consider the problem of estimating the discrete clustering structures...
research
07/09/2020

K-Means and Gaussian Mixture Modeling with a Separation Constraint

We consider the problem of clustering with K-means and Gaussian mixture ...
research
02/22/2016

Clustering subgaussian mixtures by semidefinite programming

We introduce a model-free relax-and-round algorithm for k-means clusteri...
research
05/18/2015

On the tightness of an SDP relaxation of k-means

Recently, Awasthi et al. introduced an SDP relaxation of the k-means pro...
research
05/30/2022

Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering

The singular subspaces perturbation theory is of fundamental importance ...
research
11/09/2017

Can clustering scale sublinearly with its clusters? A variational EM acceleration of GMMs and k-means

One iteration of k-means or EM for Gaussian mixture models (GMMs) scales...
research
02/05/2023

Regularization and Global Optimization in Model-Based Clustering

Due to their conceptual simplicity, k-means algorithm variants have been...

Please sign up or login with your details

Forgot password? Click here to reset