High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

by   Po-Ling Loh, et al.

Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently nonconvex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing nonconvex programs, we are able to both analyze the statistical error associated with any global optimum, and more surprisingly, to prove that a simple algorithm based on projected gradient descent will converge in polynomial time to a small neighborhood of the set of all global minimizers. On the statistical side, we provide nonasymptotic bounds that hold with high probability for the cases of noisy, missing and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm is guaranteed to converge at a geometric rate to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing close agreement with the predicted scalings.


page 1

page 2

page 3

page 4


Fast global convergence of gradient methods for high-dimensional statistical recovery

Many statistical M-estimators are based on convex optimization problems ...

An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

Missing data are frequently encountered in high-dimensional problems, bu...

Low-rank matrix estimation in multi-response regression with measurement errors: Statistical and computational guarantees

In this paper, we investigate the matrix estimation problem in the multi...

On Principal Component Regression in a High-Dimensional Error-in-Variables Setting

We analyze the classical method of Principal Component Regression (PCR) ...

Errors-in-variables models with dependent measurements

Suppose that we observe y ∈R^n and X ∈R^n × m in the following errors-in...

Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints

Today, data analysts largely rely on intuition to determine whether miss...

Scalable Interpretable Learning for Multi-Response Error-in-Variables Regression

Corrupted data sets containing noisy or missing observations are prevale...

Please sign up or login with your details

Forgot password? Click here to reset