Robust High Dimensional Sparse Regression and Matching Pursuit

01/12/2013
by   Yudong Chen, et al.
0

We consider high dimensional sparse regression, and develop strategies able to deal with arbitrary -- possibly, severe or coordinated -- errors in the covariance matrix X. These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-in-variables problems are notoriously difficult to treat, and as we demonstrate, the problem is particularly pronounced in high-dimensional settings where the primary goal is support recovery of the sparse regressor. We develop algorithms for support recovery in sparse regression, when some number n_1 out of n+n_1 total covariate/response pairs are arbitrarily (possibly maliciously) corrupted. We are interested in understanding how many outliers, n_1, we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. Perhaps surprisingly, we also show that the natural brute force algorithm that searches over all subsets of n covariate/response pairs, and all subsets of possible support coordinates in order to minimize regression error, is remarkably poor, unable to correctly identify the support with even n_1 = O(n/k) corrupted points, where k is the sparsity. This is true even in the basic setting we consider, where all authentic measurements and noise are independent and sub-Gaussian. In this setting, we provide a simple algorithm -- no more computationally taxing than OMP -- that gives stronger performance guarantees, recovering the support with up to n_1 = O(n/(√(k) p)) corrupted points, where p is the dimension of the signal to be recovered.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2018

High Dimensional Robust Sparse Regression

We provide a novel -- and to the best of our knowledge, the first -- alg...
research
12/05/2013

Swapping Variables for High-Dimensional Sparse Regression with Correlated Measurements

We consider the high-dimensional sparse linear regression problem of acc...
research
02/28/2019

Model Agnostic High-Dimensional Error-in-Variable Regression

We consider the problem of high-dimensional error-in-variable regression...
research
05/11/2020

Scalable Interpretable Learning for Multi-Response Error-in-Variables Regression

Corrupted data sets containing noisy or missing observations are prevale...
research
10/12/2018

Spherical Regression under Mismatch Corruption with Application to Automated Knowledge Translation

Motivated by a series of applications in data integration, language tran...
research
06/08/2015

Robust Regression via Hard Thresholding

We study the problem of Robust Least Squares Regression (RLSR) where sev...
research
03/21/2018

Randomized Projection Methods for Linear Systems with Arbitrarily Large Sparse Corruptions

In applications like medical imaging, error correction, and sensor netwo...

Please sign up or login with your details

Forgot password? Click here to reset