Structures of Spurious Local Minima in k-means

02/16/2020
by   Wei Qian, et al.
0

k-means clustering is a fundamental problem in unsupervised learning. The problem concerns finding a partition of the data points into k clusters such that the within-cluster variation is minimized. Despite its importance and wide applicability, a theoretical understanding of the k-means problem has not been completely satisfactory. Existing algorithms with theoretical performance guarantees often rely on sophisticated (sometimes artificial) algorithmic techniques and restricted assumptions on the data. The main challenge lies in the non-convex nature of the problem; in particular, there exist additional local solutions other than the global optimum. Moreover, the simplest and most popular algorithm for k-means, namely Lloyd's algorithm, generally converges to such spurious local solutions both in theory and in practice. In this paper, we approach the k-means problem from a new perspective, by investigating the structures of these spurious local solutions under a probabilistic generative model with k ground truth clusters. As soon as k=3, spurious local minima provably exist, even for well-separated and balanced clusters. One such local minimum puts two centers at one true cluster, and the third center in the middle of the other two true clusters. For general k, one local minimum puts multiple centers at a true cluster, and one center in the middle of multiple true clusters. Perhaps surprisingly, we prove that this is essentially the only type of spurious local minima under a separation condition. Our results pertain to the k-means formulation for mixtures of Gaussians or bounded distributions. Our theoretical results corroborate existing empirical observations and provide justification for several improved algorithms for k-means clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2022

An enhanced method of initial cluster center selection for K-means algorithm

Clustering is one of the widely used techniques to find out patterns fro...
research
01/13/2022

A Geometric Approach to k-means

k-means clustering is a fundamental problem in various disciplines. This...
research
06/23/2014

Further heuristics for k-means: The merge-and-split heuristic and the (k,l)-means

Finding the optimal k-means clustering is NP-hard in general and many he...
research
04/25/2018

HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering

Minimum sum-of-squares clustering (MSSC) is a widely used clustering mod...
research
12/08/2020

Algorithms for finding k in k-means

k-means Clustering requires as input the exact value of k, the number of...
research
03/05/2018

An Analysis of the t-SNE Algorithm for Data Visualization

A first line of attack in exploratory data analysis is data visualizatio...
research
08/02/2023

Are Easy Data Easy (for K-Means)

This paper investigates the capability of correctly recovering well-sepa...

Please sign up or login with your details

Forgot password? Click here to reset