Rate Optimal Estimation and Confidence Intervals for High-dimensional Regression with Missing Covariates

02/09/2017
by   Yining Wang, et al.
0

Although a majority of the theoretical literature in high-dimensional statistics has focused on settings which involve fully-observed data, settings with missing values and corruptions are common in practice. We consider the problems of estimation and of constructing component-wise confidence intervals in a sparse high-dimensional linear regression model when some covariates of the design matrix are missing completely at random. We analyze a variant of the Dantzig selector [9] for estimating the regression model and we use a de-biasing argument to construct component-wise confidence intervals. Our first main result is to establish upper bounds on the estimation error as a function of the model parameters (the sparsity level s, the expected fraction of observed covariates ρ_*, and a measure of the signal strength β^*_2). We find that even in an idealized setting where the covariates are assumed to be missing completely at random, somewhat surprisingly and in contrast to the fully-observed setting, there is a dichotomy in the dependence on model parameters and much faster rates are obtained if the covariance matrix of the random design is known. To study this issue further, our second main contribution is to provide lower bounds on the estimation error showing that this discrepancy in rates is unavoidable in a minimax sense. We then consider the problem of high-dimensional inference in the presence of missing data. We construct and analyze confidence intervals using a de-biased estimator. In the presence of missing data, inference is complicated by the fact that the de-biasing matrix is correlated with the pilot estimator and this necessitates the design of a new estimator and a novel analysis. We also complement our mathematical study with extensive simulations on synthetic and semi-synthetic data that show the accuracy of our asymptotic predictions for finite sample sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2018

Confidence intervals for high-dimensional Cox models

The purpose of this paper is to construct confidence intervals for the r...
research
01/24/2020

Imputation for High-Dimensional Linear Regression

We study high-dimensional regression with missing entries in the covaria...
research
09/04/2023

Challenges of the inconsistency regime: Novel debiasing methods for missing data models

We study semi-parametric estimation of the population mean when data is ...
research
10/27/2020

On Principal Component Regression in a High-Dimensional Error-in-Variables Setting

We analyze the classical method of Principal Component Regression (PCR) ...
research
02/26/2018

Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes

High-dimensional time series data exist in numerous areas such as financ...
research
07/29/2017

Fine-Gray competing risks model with high-dimensional covariates: estimation and Inference

The purpose of this paper is to construct confidence intervals for the r...
research
09/22/2016

Robust Confidence Intervals in High-Dimensional Left-Censored Regression

This paper develops robust confidence intervals in high-dimensional and ...

Please sign up or login with your details

Forgot password? Click here to reset