High-Dimensional Robust Mean Estimation in Nearly-Linear Time

11/23/2018
by   Yu Cheng, et al.
0

We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent error guarantees for several families of structured distributions. In this work, we give the first nearly-linear time algorithms for high-dimensional robust mean estimation. Specifically, we focus on distributions with (i) known covariance and sub-gaussian tails, and (ii) unknown bounded covariance. Given N samples on R^d, an ϵ-fraction of which may be arbitrarily corrupted, our algorithms run in time Õ(Nd) / poly(ϵ) and approximate the true mean within the information-theoretically optimal error, up to constant factors. Previous robust algorithms with comparable error guarantees have running times Ω̃(N d^2), for ϵ = Ω(1). Our algorithms rely on a natural family of SDPs parameterized by our current guess ν for the unknown mean μ^. We give a win-win analysis establishing the following: either a near-optimal solution to the primal SDP yields a good candidate for μ^ -- independent of our current guess ν -- or the dual SDP yields a new guess ν' whose distance from μ^ is smaller by a constant factor. We exploit the special structure of the corresponding SDPs to show that they are approximately solvable in nearly-linear time. Our approach is quite general, and we believe it can also be applied to obtain nearly-linear time algorithms for other high-dimensional robust learning problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2019

Faster Algorithms for High-Dimensional Robust Covariance Estimation

We study the problem of estimating the covariance matrix of a high-dimen...
research
07/16/2020

Optimal Robust Linear Regression in Nearly Linear Time

We study the problem of high-dimensional robust linear regression where ...
research
08/18/2020

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

We study the problem of robustly estimating the mean of a d-dimensional ...
research
03/19/2019

How Hard Is Robust Mean Estimation?

Robust mean estimation is the problem of estimating the mean μ∈R^d of a ...
research
06/12/2020

Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

We develop two methods for the following fundamental statistical task: g...
research
05/12/2021

Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time

We study the problem of learning Bayesian networks where an ϵ-fraction o...
research
04/12/2017

Robustly Learning a Gaussian: Getting Optimal Error, Efficiently

We study the fundamental problem of learning the parameters of a high-di...

Please sign up or login with your details

Forgot password? Click here to reset