List-Decodable Mean Estimation in Nearly-PCA Time

by   Ilias Diakonikolas, et al.

Traditionally, robust statistics has focused on designing estimators tolerant to a minority of contaminated data. Robust list-decodable learning focuses on the more challenging regime where only a minority 1/k fraction of the dataset is drawn from the distribution of interest, and no assumptions are made on the remaining data. We study the fundamental task of list-decodable mean estimation in high dimensions. Our main result is a new list-decodable mean estimation algorithm for bounded covariance distributions with optimal sample complexity and error rate, running in nearly-PCA time. Assuming the ground truth distribution on ℝ^d has bounded covariance, our algorithm outputs a list of O(k) candidate means, one of which is within distance O(√(k)) from the truth. Our algorithm runs in time O(ndk) for all k = O(√(d)) ∪Ω(d), where n is the size of the dataset. We also show that a variant of our algorithm has runtime O(ndk) for all k, at the expense of an O(√(log k)) factor in the recovery guarantee. This runtime matches up to logarithmic factors the cost of performing a single k-PCA on the data, which is a natural bottleneck of known algorithms for (very) special cases of our problem, such as clustering well-separated mixtures. Prior to our work, the fastest list-decodable mean estimation algorithms had runtimes O(n^2 d k^2) and O(nd k^≥ 6). Our approach builds on a novel soft downweighting method, 𝖲𝖨𝖥𝖳, which is arguably the simplest known polynomial-time mean estimation technique in the list-decodable learning setting. To develop our fast algorithms, we boost the computational cost of 𝖲𝖨𝖥𝖳 via a careful "win-win-win" analysis of an approximate Ky Fan matrix multiplicative weights procedure we develop, which we believe may be of independent interest.


page 1

page 2

page 3

page 4


Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean Estimation

We study the problem of list-decodable mean estimation, where an adversa...

List Decodable Mean Estimation in Nearly Linear Time

Learning from data in the presence of outliers is a fundamental problem ...

List-Decodable Mean Estimation via Iterative Multi-Fitering

We study the problem of list-decodable mean estimation for bounded cova...

List-Decodable Mean Estimation via Iterative Multi-Filtering

We study the problem of list-decodable mean estimation for bounded covar...

Clustering Mixtures with Almost Optimal Separation in Polynomial Time

We consider the problem of clustering mixtures of mean-separated Gaussia...

List-Decodable Sparse Mean Estimation

Robust mean estimation is one of the most important problems in statisti...

Learning from Untrusted Data

The vast majority of theoretical results in machine learning and statist...