Efficient Clustering for Stretched Mixtures: Landscape and Optimality

03/22/2020
by   Kaizheng Wang, et al.
0

This paper considers a canonical clustering problem where one receives unlabeled samples drawn from a balanced mixture of two elliptical distributions and aims for a classifier to estimate the labels. Many popular methods including PCA and k-means require individual components of the mixture to be somewhat spherical, and perform poorly when they are stretched. To overcome this issue, we propose a non-convex program seeking for an affine transform to turn the data into a one-dimensional point cloud concentrating around -1 and 1, after which clustering becomes easy. Our theoretical contributions are two-fold: (1) we show that the non-convex loss function exhibits desirable landscape properties as long as the sample size exceeds some constant multiple of the dimension, and (2) we leverage this to prove that an efficient first-order algorithm achieves near-optimal statistical precision even without good initialization. We also propose a general methodology for multi-class clustering tasks with flexible choices of feature transforms and loss objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2021

Clustering a Mixture of Gaussians with Unknown Covariance

We investigate a clustering problem with data from a mixture of Gaussian...
research
05/18/2021

On Convex Clustering Solutions

Convex clustering is an attractive clustering algorithm with favorable p...
research
04/08/2019

On a class of distributions generated by stochastic mixture of the extreme order statistics of a sample of size two

This paper considers a family of distributions constructed by a stochast...
research
02/13/2018

Fast Global Convergence via Landscape of Empirical Loss

While optimizing convex objective (loss) functions has been a powerhouse...
research
06/27/2012

Copula Mixture Model for Dependency-seeking Clustering

We introduce a copula mixture model to perform dependency-seeking cluste...
research
07/23/2018

A computational geometry method for the inverse scattering problem

In this paper we demonstrate a computational method to solve the inverse...
research
10/02/2014

Mapping Energy Landscapes of Non-Convex Learning Problems

In many statistical learning problems, the target functions to be optimi...

Please sign up or login with your details

Forgot password? Click here to reset