Mixture Models, Robustness, and Sum of Squares Proofs
We use the Sum of Squares method to develop new efficient algorithms for learning well-separated mixtures of Gaussians and robust mean estimation, both in high dimensions, that substantially improve upon the statistical guarantees achieved by previous efficient algorithms. Firstly, we study mixtures of k distributions in d dimensions, where the means of every pair of distributions are separated by at least k^ε. In the special case of spherical Gaussian mixtures, we give a (dk)^O(1/ε^2)-time algorithm that learns the means assuming separation at least k^ε, for any ε > 0. This is the first algorithm to improve on greedy ("single-linkage") and spectral clustering, breaking a long-standing barrier for efficient algorithms at separation k^1/4. We also study robust estimation. When an unknown (1-ε)-fraction of X_1,...,X_n are chosen from a sub-Gaussian distribution with mean μ but the remaining points are chosen adversarially, we give an algorithm recovering μ to error ε^1-1/t in time d^O(t^2), so long as sub-Gaussian-ness up to O(t) moments can be certified by a Sum of Squares proof. This is the first polynomial-time algorithm with guarantees approaching the information-theoretic limit for non-Gaussian distributions. Previous algorithms could not achieve error better than ε^1/2. Both of these results are based on a unified technique. Inspired by recent algorithms of Diakonikolas et al. in robust statistics, we devise an SDP based on the Sum of Squares method for the following setting: given X_1,...,X_n ∈R^d for large d and n = poly(d) with the promise that a subset of X_1,...,X_n were sampled from a probability distribution with bounded moments, recover some information about that distribution.