A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance & heterogeneous noise

by   Roberto I. Oliveira, et al.

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted n-sized label-feature sample of at most ϵ n arbitrary outliers. We wish to estimate a p-dimensional parameter b^* given such sample of a label-feature pair (y,x) satisfying y=⟨ x,b^*⟩+ξ with heavy-tailed (x,ξ). We only assume x is L^4-L^2 hypercontractive with constant L>0 and has covariance matrix Σ with minimum eigenvalue 1/μ^2>0 and bounded condition number κ>0. The noise ξ can be arbitrarily dependent on x and nonsymmetric as long as ξ x has finite covariance matrix Ξ. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on (Σ,Ξ) nor the operator norm of Ξ. With probability at least 1-δ, our proposed estimator attains the statistical rate μ^2‖Ξ‖^1/2(p/n+log(1/δ)/n+ϵ)^1/2 and breakdown-point ϵ≲1/L^4κ^2, both optimal in the ℓ_2-norm, assuming the near-optimal minimum sample size L^4κ^2(plog p + log(1/δ))≲ n, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction v̂ with respect to the (unknown) pre-conditioned inner product ⟨Σ(·),·⟩. The second stage estimate the descent direction Σv̂ with respect to the (known) inner product ⟨·,·⟩, without knowing nor estimating Σ.


page 1

page 2

page 3

page 4


Loss minimization and parameter estimation with heavy tails

This work studies applications and generalizations of a simple estimatio...

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

We consider high-dimensional least-squares regression when a fraction ϵ ...

Robust Estimation of Covariance Matrices: Adversarial Contamination and Beyond

We consider the problem of estimating the covariance structure of a rand...

Near Optimal Heteroscedastic Regression with Symbiotic Learning

We consider the problem of heteroscedastic linear regression, where, giv...

Concentration study of M-estimators using the influence function

We present a new finite-sample analysis of M-estimators of locations in ...

Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries

Estimation of the covariance matrix has attracted a lot of attention of ...

Reliable Covariance Estimation

Covariance or scatter matrix estimation is ubiquitous in most modern sta...

Please sign up or login with your details

Forgot password? Click here to reset