Clustering Mixtures with Almost Optimal Separation in Polynomial Time

12/01/2021
by   Jerry Li, et al.
0

We consider the problem of clustering mixtures of mean-separated Gaussians in high dimensions. We are given samples from a mixture of k identity covariance Gaussians, so that the minimum pairwise distance between any two pairs of means is at least Δ, for some parameter Δ > 0, and the goal is to recover the ground truth clustering of these samples. It is folklore that separation Δ = Θ (√(log k)) is both necessary and sufficient to recover a good clustering, at least information theoretically. However, the estimators which achieve this guarantee are inefficient. We give the first algorithm which runs in polynomial time, and which almost matches this guarantee. More precisely, we give an algorithm which takes polynomially many samples and time, and which can successfully recover a good clustering, so long as the separation is Δ = Ω (log^1/2 + c k), for any c > 0. Previously, polynomial time algorithms were only known for this problem when the separation was polynomial in k, and all algorithms which could tolerate ( log k ) separation required quasipolynomial time. We also extend our result to mixtures of translations of a distribution which satisfies the Poincaré inequality, under additional mild assumptions. Our main technical tool, which we believe is of independent interest, is a novel way to implicitly represent and estimate high degree moments of a distribution, which allows us to extract important information about high-degree moments without ever writing down the full moment tensors explicitly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2022

A Fourier Approach to Mixture Learning

We revisit the problem of learning mixtures of spherical Gaussians. Give...
research
05/23/2023

On the robust learning mixtures of linear regressions

In this note, we consider the problem of robust learning mixtures of lin...
research
10/31/2017

On Learning Mixtures of Well-Separated Gaussians

We consider the problem of efficiently learning mixtures of a large numb...
research
05/13/2020

Robustly Learning any Clusterable Mixture of Gaussians

We study the efficient learnability of high-dimensional Gaussian mixture...
research
12/08/2020

Algorithms for finding k in k-means

k-means Clustering requires as input the exact value of k, the number of...
research
02/02/2020

EM Algorithm is Sample-Optimal for Learning Mixtures of Well-Separated Gaussians

We consider the problem of spherical Gaussian Mixture models with k ≥ 3 ...
research
08/21/2020

Schematic Representation of Large Biconnected Graphs

Suppose that a biconnected graph is given, consisting of a large compone...

Please sign up or login with your details

Forgot password? Click here to reset