Convex Clustering: Model, Theoretical Guarantee and Efficient Algorithm

10/04/2018
by   Defeng Sun, et al.
0

Clustering is a fundamental problem in unsupervised learning. Popular methods like K-means, may suffer from poor performance as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as the clustering path) has been proposed in Pelckmans et al. (2005), Lindsten et al. (2011) and Hocking et al. (2011). The perfect recovery properties of the convex clustering model with uniformly weighted all pairwise-differences regularization have been proved by Zhu et al. (2014) and Panahi et al. (2017). However, no theoretical guarantee has been established for the general weighted convex clustering model, where better empirical results have been observed. In the numerical optimization aspect, although algorithms like the alternating direction method of multipliers (ADMM) and the alternating minimization algorithm (AMA) have been proposed to solve the convex clustering model (Chi and Lange, 2015), it still remains very challenging to solve large-scale problems. In this paper, we establish sufficient conditions for the perfect recovery guarantee of the general weighted convex clustering model, which include and improve existing theoretical results as special cases. In addition, we develop a semismooth Newton based augmented Lagrangian method for solving large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm comparing to the existing first-order methods. In particular, our algorithm is able to solve a convex clustering problem with 200,000 points in R^3 in about 6 minutes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2018

An Efficient Semismooth Newton Based Algorithm for Convex Clustering

Clustering may be the most fundamental problem in unsupervised learning ...
research
05/11/2021

A Euclidean Distance Matrix Model for Convex Clustering

Clustering has been one of the most basic and essential problems in unsu...
research
04/01/2013

Splitting Methods for Convex Clustering

Clustering is a fundamental problem in many scientific applications. Sta...
research
06/19/2020

On identifying clusters from sum-of-norms clustering computation

Sum-of-norms clustering is a clustering formulation based on convex opti...
research
03/29/2023

Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees

In this paper, we propose a randomly projected convex clustering model f...
research
08/04/2019

Simultaneous Clustering and Optimization for Evolving Datasets

Simultaneous clustering and optimization (SCO) has recently drawn much a...
research
02/19/2019

Recovery of a mixture of Gaussians by sum-of-norms clustering

Sum-of-norms clustering is a method for assigning n points in R^d to K c...

Please sign up or login with your details

Forgot password? Click here to reset