Semi-Supervised Statistical Inference for High-Dimensional Linear Regression with Blockwise Missing Data

06/07/2021
by   Fei Xue, et al.
0

Blockwise missing data occurs frequently when we integrate multisource or multimodality data where different sources or modalities contain complementary information. In this paper, we consider a high-dimensional linear regression model with blockwise missing covariates and a partially observed response variable. Under this semi-supervised framework, we propose a computationally efficient estimator for the regression coefficient vector based on carefully constructed unbiased estimating equations and a multiple blockwise imputation procedure, and obtain its rates of convergence. Furthermore, building upon an innovative semi-supervised projected estimating equation technique that intrinsically achieves bias-correction of the initial estimator, we propose nearly unbiased estimators for the individual regression coefficients that are asymptotically normally distributed under mild conditions. By carefully analyzing these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and application analysis of the Alzheimer's Disease Neuroimaging Initiative data show that the proposed method performs better and benefits more from unsupervised samples than existing methods.

READ FULL TEXT
research
02/14/2021

Improved Estimators for Semi-supervised High-dimensional Regression Model

We study a linear high-dimensional regression model in a semi-supervised...
research
06/16/2018

Semi-supervised Inference for Explained Variance in High-dimensional Linear Regression and Its Applications

We consider statistical inference for the explained variance β^Σβ under ...
research
11/06/2020

Estimation, Confidence Intervals, and Large-Scale Hypotheses Testing for High-Dimensional Mixed Linear Regression

This paper studies the high-dimensional mixed linear regression (MLR) wh...
research
07/24/2022

Statistical inference for high-dimensional generalized estimating equations

We propose a novel inference procedure for linear combinations of high-d...
research
03/28/2015

Sparse Linear Regression With Missing Data

This paper proposes a fast and accurate method for sparse regression in ...
research
07/29/2021

CAD: Debiasing the Lasso with inaccurate covariate model

We consider the problem of estimating a low-dimensional parameter in hig...
research
07/22/2023

Collaboratively Learning Linear Models with Structured Missing Data

We study the problem of collaboratively learning least squares estimates...

Please sign up or login with your details

Forgot password? Click here to reset