Diffusion K-means clustering on manifolds: provable exact recovery via semidefinite relaxations

03/11/2019
by   Xiaohui Chen, et al.
0

We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm (i.e., trace norm) regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.

READ FULL TEXT
research
06/19/2017

Capacity Releasing Diffusion for Speed and Locality

Diffusions and related random walk procedures are of central importance ...
research
06/06/2016

On Robustness of Kernel Clustering

Clustering is one of the most important unsupervised problems in machine...
research
09/14/2022

Wasserstein K-means for clustering probability distributions

Clustering is an important exploratory data analysis technique to group ...
research
10/30/2019

Superconvergence of gradient recovery on deviated discretized manifolds

This paper addresses open questions proposed by Wei, Chen and Huang [ SI...
research
02/25/2012

Clustering using Max-norm Constrained Optimization

We suggest using the max-norm as a convex surrogate constraint for clust...
research
10/11/2021

Density-Based Clustering with Kernel Diffusion

Finding a suitable density function is essential for density-based clust...
research
06/23/2020

Disentangling by Subspace Diffusion

We present a novel nonparametric algorithm for symmetry-based disentangl...

Please sign up or login with your details

Forgot password? Click here to reset