Efficient Algorithms and Lower Bounds for Robust Linear Regression

05/31/2018
by   Ilias Diakonikolas, et al.
0

We study the problem of high-dimensional linear regression in a robust model where an ϵ-fraction of the samples can be adversarially corrupted. We focus on the fundamental setting where the covariates of the uncorrupted samples are drawn from a Gaussian distribution N(0, Σ) on R^d. We give nearly tight upper bounds and computational lower bounds for this problem. Specifically, our main contributions are as follows: For the case that the covariance matrix is known to be the identity, we give a sample near-optimal and computationally efficient algorithm that outputs a candidate hypothesis vector β which approximates the unknown regression vector β within ℓ_2-norm O(ϵ(1/ϵ) σ), where σ is the standard deviation of the random observation noise. An error of Ω (ϵσ) is information-theoretically necessary, even with infinite sample size. Prior work gave an algorithm for this problem with sample complexity Ω̃(d^2/ϵ^2) whose error guarantee scales with the ℓ_2-norm of β. For the case of unknown covariance, we show that we can efficiently achieve the same error guarantee as in the known covariance case using an additional Õ(d^2/ϵ^2) unlabeled examples. On the other hand, an error of O(ϵσ) can be information-theoretically attained with O(d/ϵ^2) samples. We prove a Statistical Query (SQ) lower bound providing evidence that this quadratic tradeoff in the sample size is inherent. More specifically, we show that any polynomial time SQ learning algorithm for robust linear regression (in Huber's contamination model) with estimation complexity O(d^2-c), where c>0 is an arbitrarily small constant, must incur an error of Ω(√(ϵ)σ).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

The Sample Complexity of Robust Covariance Testing

We study the problem of testing the covariance matrix of a high-dimensio...
research
06/17/2021

Statistical Query Lower Bounds for List-Decodable Linear Regression

We study the problem of list-decodable linear regression, where an adver...
research
04/18/2019

Memory-Sample Tradeoffs for Linear Regression with Small Error

We consider the problem of performing linear regression over a stream of...
research
06/29/2020

Robust Linear Regression: Optimal Rates in Polynomial Time

We obtain a robust and computationally efficient estimator for Linear Re...
research
06/25/2023

Near Optimal Heteroscedastic Regression with Symbiotic Learning

We consider the problem of heteroscedastic linear regression, where, giv...
research
05/22/2018

Solvable Integration Problems and Optimal Sample Size Selection

We want to compute the integral of a function or the expectation of a ra...
research
09/20/2023

GLM Regression with Oblivious Corruptions

We demonstrate the first algorithms for the problem of regression for ge...

Please sign up or login with your details

Forgot password? Click here to reset