CAD: Debiasing the Lasso with inaccurate covariate model

07/29/2021
by   Michael Celentano, et al.
0

We consider the problem of estimating a low-dimensional parameter in high-dimensional linear regression. Constructing an approximately unbiased estimate of the parameter of interest is a crucial step towards performing statistical inference. Several authors suggest to orthogonalize both the variable of interest and the outcome with respect to the nuisance variables, and then regress the residual outcome with respect to the residual variable. This is possible if the covariance structure of the regressors is perfectly known, or is sufficiently structured that it can be estimated accurately from data (e.g., the precision matrix is sufficiently sparse). Here we consider a regime in which the covariate model can only be estimated inaccurately, and hence existing debiasing approaches are not guaranteed to work. When errors in estimating the covariate model are correlated with errors in estimating the linear model parameter, an incomplete elimination of the bias occurs. We propose the Correlation Adjusted Debiased Lasso (CAD), which nearly eliminates this bias in some cases, including cases in which the estimation errors are neither negligible nor orthogonal. We consider a setting in which some unlabeled samples might be available to the statistician alongside labeled ones (semi-supervised learning), and our guarantees hold under the assumption of jointly Gaussian covariates. The new debiased estimator is guaranteed to cancel the bias in two cases: (1) when the total number of samples (labeled and unlabeled) is larger than the number of parameters, or (2) when the covariance of the nuisance (but not the effect of the nuisance on the variable of interest) is known. Neither of these cases is treated by state-of-the-art methods.

READ FULL TEXT
research
07/25/2017

Compressed Sparse Linear Regression

High-dimensional sparse linear regression is a basic problem in machine ...
research
08/11/2015

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

Performing statistical inference in high-dimension is an outstanding cha...
research
11/28/2020

Optimal Semi-supervised Estimation and Inference for High-dimensional Linear Regression

There are many scenarios such as the electronic health records where the...
research
06/07/2021

Semi-Supervised Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

Blockwise missing data occurs frequently when we integrate multisource o...
research
12/01/2020

Systematic errors in estimates of R_t from symptomatic cases in the presence of observation bias

We consider the problem of estimating the reproduction number R_t of an ...
research
06/20/2016

On the prediction loss of the lasso in the partially labeled setting

In this paper we revisit the risk bounds of the lasso estimator in the c...
research
04/18/2023

Sharp-SSL: Selective high-dimensional axis-aligned random projections for semi-supervised learning

We propose a new method for high-dimensional semi-supervised learning pr...

Please sign up or login with your details

Forgot password? Click here to reset