List-Decodable Mean Estimation in Nearly-PCA Time

11/19/2020
by   Ilias Diakonikolas, et al.
0

Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority 1/k fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on ℝ^d has bounded covariance, our algorithm outputs a list of O(k) candidate means, one of which is within distance O(√(k)) from the truth. Our algorithm runs in time O(ndk) for all k = O(√(d)) ∪Ω(d), where n is the size of the dataset. We also show that a variant of our algorithm has runtime O(ndk) for all k, at the expense of an O(√(log k)) factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single k-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes O(n^2 d k^2) and O(nd k^≥ 6). Our approach builds on a novel soft downweighting method, 𝖲𝖨𝖥𝖳, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of 𝖲𝖨𝖥𝖳 via a careful "win-win-win" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

We study the problem of list-decodable mean estimation, where an adversa...
research
05/20/2020

List Decodable Mean Estimation in Nearly Linear Time

Learning from data in the presence of outliers is a fundamental problem ...
research
06/22/2022

List-Decodable Covariance Estimation

We give the first polynomial time algorithm for list-decodable covarianc...
research
06/18/2020

List-Decodable Mean Estimation via Iterative Multi-Fitering

We study the problem of list-decodable mean estimation for bounded cova...
research
06/18/2020

List-Decodable Mean Estimation via Iterative Multi-Filtering

We study the problem of list-decodable mean estimation for bounded covar...
research
05/28/2022

List-Decodable Sparse Mean Estimation

Robust mean estimation is one of the most important problems in statisti...
research
02/12/2020

List-Decodable Subspace Recovery via Sum-of-Squares

We give the first efficient algorithm for the problem of list-decodable ...

Please sign up or login with your details

Forgot password? Click here to reset