Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers

11/04/2021
by   Tommaso d'Orsi, et al.
0

We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an α fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size n≳ (klog d)/α^2 and optimal error rate O(√((klog d)/(n·α^2))) where n is the number of observations, d is the number of dimensions and k is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers α is o(1/loglog n), even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in n (e.g., Gaussian with variance 1/n^2). To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the ℓ_1 norm or the nuclear norm, and extend d'Orsi et al.'s approach [dNS21] in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Regress Consistently when Oblivious Outliers Overwhelm

We give a novel analysis of the Huber loss estimator for consistent robu...
research
12/12/2020

Outlier-robust sparse/low-rank least-squares regression and robust matrix completion

We consider high-dimensional least-squares regression when a fraction ϵ ...
research
05/09/2015

Estimation with Norm Regularization

Analysis of non-asymptotic estimation error and structured statistical r...
research
10/24/2022

Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

Recovering linear subspaces from data is a fundamental and important tas...
research
10/24/2019

ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels

We study Empirical Risk Minimizers (ERM) and Regularized Empirical Risk ...
research
08/11/2014

Optimum Statistical Estimation with Strategic Data Sources

We propose an optimum mechanism for providing monetary incentives to the...
research
10/06/2021

Robust Generalized Method of Moments: A Finite Sample Viewpoint

For many inference problems in statistics and econometrics, the unknown ...

Please sign up or login with your details

Forgot password? Click here to reset