Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming

05/29/2023
by   Yubo Zhuang, et al.
0

K-means clustering is a widely used machine learning method for identifying patterns in large datasets. Semidefinite programming (SDP) relaxations have recently been proposed for solving the K-means optimization problem that enjoy strong statistical optimality guarantees, but the prohibitive cost of implementing an SDP solver renders these guarantees inaccessible to practical datasets. By contrast, nonnegative matrix factorization (NMF) is a simple clustering algorithm that is widely used by machine learning practitioners, but without a solid statistical underpinning nor rigorous guarantees. In this paper, we describe an NMF-like algorithm that works by solving a nonnegative low-rank restriction of the SDP relaxed K-means formulation using a nonconvex Burer–Monteiro factorization approach. The resulting algorithm is just as simple and scalable as state-of-the-art NMF algorithms, while also enjoying the same strong statistical optimality guarantees as the SDP. In our experiments, we observe that our algorithm achieves substantially smaller mis-clustering errors compared to the existing state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Rethinking Symmetric Matrix Factorization: A More General and Better Clustering Perspective

Nonnegative matrix factorization (NMF) is widely used for clustering wit...
research
09/02/2017

On Identifiability of Nonnegative Matrix Factorization

In this letter, we propose a new identification criterion that guarantee...
research
08/19/2012

Adaptive Graph via Multiple Kernel Learning for Nonnegative Matrix Factorization

Nonnegative Matrix Factorization (NMF) has been continuously evolving in...
research
12/27/2016

Rank-One NMF-Based Initialization for NMF and Relative Error Bounds under a Geometric Assumption

We propose a geometric assumption on nonnegative data matrices such that...
research
06/18/2012

Clustering by Low-Rank Doubly Stochastic Matrix Decomposition

Clustering analysis by nonnegative low-rank approximations has achieved ...
research
06/20/2017

Frank-Wolfe Optimization for Symmetric-NMF under Simplicial Constraint

We propose a Frank-Wolfe (FW) solver to optimize the symmetric nonnegati...
research
12/04/2020

Community detection using fast low-cardinality semidefinite programming

Modularity maximization has been a fundamental tool for understanding th...

Please sign up or login with your details

Forgot password? Click here to reset