Optimal Robust Linear Regression in Nearly Linear Time

by   Yeshwanth Cherapanamjeri, et al.

We study the problem of high-dimensional robust linear regression where a learner is given access to n samples from the generative model Y = ⟨ X,w^* ⟩ + ϵ (with X ∈ℝ^d and ϵ independent), in which an η fraction of the samples have been adversarially corrupted. We propose estimators for this problem under two settings: (i) X is L4-L2 hypercontractive, 𝔼 [XX^⊤] has bounded condition number and ϵ has bounded variance and (ii) X is sub-Gaussian with identity second moment and ϵ is sub-Gaussian. In both settings, our estimators: (a) Achieve optimal sample complexities and recovery guarantees up to log factors and (b) Run in near linear time (Õ(nd / η^6)). Prior to our work, polynomial time algorithms achieving near optimal sample complexities were only known in the setting where X is Gaussian with identity covariance and ϵ is Gaussian, and no linear time estimators were known for robust linear regression in any setting. Our estimators and their analysis leverage recent developments in the construction of faster algorithms for robust mean estimation to improve runtimes, and refined concentration of measure arguments alongside Gaussian rounding techniques to improve statistical sample complexities.


page 1

page 2

page 3

page 4


High-Dimensional Robust Mean Estimation in Nearly-Linear Time

We study the fundamental problem of high-dimensional mean estimation in ...

Non-asymptotic analysis and inference for an outlyingness induced winsorized mean

Robust estimation of a mean vector, a topic regarded as obsolete in the ...

Restricted eigenvalue property for corrupted Gaussian designs

Motivated by the construction of robust estimators using the convex rela...

Robust Mean Estimation under Coordinate-level Corruption

Data corruption, systematic or adversarial, may skew statistical estimat...

Robust Regression Revisited: Acceleration and Improved Estimation Rates

We study fast algorithms for statistical regression problems under the s...

Near-Linear Time Local Polynomial Nonparametric Estimation

Local polynomial regression (Fan & Gijbels, 1996) is an important class ...

Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time

We study the problem of learning Bayesian networks where an ϵ-fraction o...