A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance & heterogeneous noise

09/06/2022
by   Roberto I. Oliveira, et al.
0

We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted n-sized label-feature sample of at most ϵ n arbitrary outliers. We wish to estimate a p-dimensional parameter b^* given such sample of a label-feature pair (y,x) satisfying y=⟨ x,b^*⟩+ξ with heavy-tailed (x,ξ). We only assume x is L^4-L^2 hypercontractive with constant L>0 and has covariance matrix Σ with minimum eigenvalue 1/μ^2>0 and bounded condition number κ>0. The noise ξ can be arbitrarily dependent on x and nonsymmetric as long as ξ x has finite covariance matrix Ξ. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on (Σ,Ξ) nor the operator norm of Ξ. With probability at least 1-δ, our proposed estimator attains the statistical rate μ^2‖Ξ‖^1/2(p/n+log(1/δ)/n+ϵ)^1/2 and breakdown-point ϵ≲1/L^4κ^2, both optimal in the ℓ_2-norm, assuming the near-optimal minimum sample size L^4κ^2(plog p + log(1/δ))≲ n, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction v̂ with respect to the (unknown) pre-conditioned inner product ⟨Σ(·),·⟩. The second stage estimate the descent direction Σv̂ with respect to the (known) inner product ⟨·,·⟩, without knowing nor estimating Σ.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset