Likelihood adjusted semidefinite programs for clustering heterogeneous data

09/29/2022
by   Yubo Zhuang, et al.
0

Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed K-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the exact observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation – a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including K-means, SDP and EM algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2020

Cutoff for exact recovery of Gaussian mixture models

We determine the cutoff value on separation of cluster centers for exact...
research
12/25/2013

Robust EM algorithm for model-based curve clustering

Model-based clustering approaches concern the paradigm of exploratory da...
research
02/06/2013

An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering

Assignment methods are at the heart of many algorithms for unsupervised ...
research
07/08/2020

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

We study two practically important cases of model based clustering using...
research
01/20/2022

Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering

Semidefinite programming (SDP) is a powerful tool for tackling a wide ra...
research
09/26/2020

An Adaptive EM Accelerator for Unsupervised Learning of Gaussian Mixture Models

We propose an Anderson Acceleration (AA) scheme for the adaptive Expecta...
research
12/18/2019

Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces

We present an approach for efficiently training Gaussian Mixture Models ...

Please sign up or login with your details

Forgot password? Click here to reset