Linear Time Clustering for High Dimensional Mixtures of Gaussian Clouds

12/19/2017
by   Dan Kushnir, et al.
0

Clustering mixtures of Gaussian distributions is a fundamental and challenging problem that is ubiquitous in various high-dimensional data processing tasks. In this paper, we propose a novel and efficient clustering algorithm for n points drawn from a mixture of two Gaussian distributions in R^p. The algorithm involves performing random 1-dimensional projections until a direction is found that yields the user-specified clustering error e. For a 1-dimensional separability parameter γ satisfying γ=Q^-1(e), the expected number of such projections is shown to be bounded by o( p), when γ satisfies γ≤ cp, with c as the separability parameter of the two Gaussians in R^p. It is shown that the square of the 1-dimensional separability resulting from a random projection is in expectation equal to c^2, thus guaranteeing a small number of projections in realistic scenarios. Consequently, the expected overall running time of the algorithm is linear in n and quasi-linear in p. This result stands in contrast to prior works which learn the parameters of the Gaussian mixture model and provide polynomial or at-best quadratic running time in p and n. The new scheme is particularly appealing in the challenging setup where the ambient dimension of the data, p, is very large and yet the number of sample points, n, is small or of the same order as p. We show that the bound on the expected number of 1-dimensional projections extends to the case of three or more Gaussian mixture distributions. Finally, we validate these results with numerical experiments in which the proposed algorithm is shown to perform within the prescribed accuracy and running time bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2017

Clustering Semi-Random Mixtures of Gaussians

Gaussian mixture models (GMM) are the most widely used statistical model...
research
07/26/2022

Efficient Algorithms for Sparse Moment Problems without Separation

We consider the sparse moment problem of learning a k-spike mixture in h...
research
09/19/2023

Worst-Case and Smoothed Analysis of Hartigan's Method for k-Means Clustering

We analyze the running time of Hartigan's method, an old algorithm for t...
research
03/07/2023

Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models

We study the problem of privately estimating the parameters of d-dimensi...
research
08/29/2016

Robust Discriminative Clustering with Sparse Regularizers

Clustering high-dimensional data often requires some form of dimensional...
research
06/13/2018

Pattern Dependence Detection using n-TARP Clustering

Consider an experiment involving a potentially small number of subjects....
research
03/10/2019

One-Pass Sparsified Gaussian Mixtures

We present a one-pass sparsified Gaussian mixture model (SGMM). Given P-...

Please sign up or login with your details

Forgot password? Click here to reset