Robustly Learning any Clusterable Mixture of Gaussians

05/13/2020
by   Ilias Diakonikolas, et al.
6

We study the efficient learnability of high-dimensional Gaussian mixtures in the outlier-robust setting, where a small constant fraction of the data is adversarially corrupted. We resolve the polynomial learnability of this problem when the components are pairwise separated in total variation distance. Specifically, we provide an algorithm that, for any constant number of components k, runs in polynomial time and learns the components of an ϵ-corrupted k-mixture within information theoretically near-optimal error of Õ(ϵ), under the assumption that the overlap between any pair of components P_i, P_j (i.e., the quantity 1-TV(P_i, P_j)) is bounded by poly(ϵ). Our separation condition is the qualitatively weakest assumption under which accurate clustering of the samples is possible. In particular, it allows for components with arbitrary covariances and for components with identical means, as long as their covariances differ sufficiently. Ours is the first polynomial time algorithm for this problem, even for k=2. Our algorithm follows the Sum-of-Squares based proofs to algorithms approach. Our main technical contribution is a new robust identifiability proof of clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system. The key ingredients of this proof are a novel use of SoS-certifiable anti-concentration and a new characterization of pairs of Gaussians with small (dimension-independent) overlap in terms of their parameter distance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2020

Outlier-Robust Clustering of Non-Spherical Mixtures

We give the first outlier-robust efficient algorithm for clustering a mi...
research
07/12/2020

Robust Learning of Mixtures of Gaussians

We resolve one of the major outstanding problems in robust statistics. I...
research
04/19/2021

Learning GMMs with Nearly Optimal Robustness Guarantees

In this work we solve the problem of robustly learning a high-dimensiona...
research
12/03/2020

Robustly Learning Mixtures of k Arbitrary Gaussians

We give a polynomial-time algorithm for the problem of robustly estimati...
research
12/10/2021

Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

We consider mixtures of k≥ 2 Gaussian components with unknown means and ...
research
11/06/2020

Settling the Robust Learnability of Mixtures of Gaussians

This work represents a natural coalescence of two important lines of wor...
research
12/01/2021

Clustering Mixtures with Almost Optimal Separation in Polynomial Time

We consider the problem of clustering mixtures of mean-separated Gaussia...

Please sign up or login with your details

Forgot password? Click here to reset