Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering

01/20/2022
by   Yubo Zhuang, et al.
0

Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed K-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the K-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original K-means SDP with substantially reduced runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2020

Sketching semidefinite programs for faster clustering

Many clustering problems enjoy solutions by semidefinite programming. Th...
research
01/05/2020

Cutoff for exact recovery of Gaussian mixture models

We determine the cutoff value on separation of cluster centers for exact...
research
02/22/2016

Clustering subgaussian mixtures by semidefinite programming

We introduce a model-free relax-and-round algorithm for k-means clusteri...
research
09/29/2022

Likelihood adjusted semidefinite programs for clustering heterogeneous data

Clustering is a widely deployed unsupervised learning tool. Model-based ...
research
11/28/2022

Sketch-and-solve approaches to k-means clustering by semidefinite programming

We introduce a sketch-and-solve approach to speed up the Peng-Wei semide...
research
10/27/2016

Compressive K-means

The Lloyd-Max algorithm is a classical approach to perform K-means clust...
research
02/03/2022

Fast and explainable clustering based on sorting

We introduce a fast and explainable clustering method called CLASSIX. It...

Please sign up or login with your details

Forgot password? Click here to reset