The More Data, the Better? Demystifying Deletion-Based Methods in Linear Regression with Missing Data

10/26/2020
by   Tianchen Xu, et al.
0

We compare two deletion-based methods for dealing with the problem of missing observations in linear regression analysis. One is the complete-case analysis (CC, or listwise deletion) that discards all incomplete observations and only uses common samples for ordinary least-squares estimation. The other is the available-case analysis (AC, or pairwise deletion) that utilizes all available data to estimate the covariance matrices and applies these matrices to construct the normal equation. We show that the estimates from both methods are asymptotically unbiased and further compare their asymptotic variances in some typical situations. Surprisingly, using more data (i.e., AC) does not necessarily lead to better asymptotic efficiency in many scenarios. Missing patterns, covariance structure and true regression coefficient values all play a role in determining which is better. We further conduct simulation studies to corroborate the findings and demystify what has been missed or misinterpreted in the literature. Some detailed proofs and simulation results are available in the online supplemental materials.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2023

Predicting blood pressure under circumstances of missing data: An analysis of missing data patterns and imputation methods using NHANES

The World Health Organization defines cardio-vascular disease (CVD) as "...
research
09/12/2012

Likelihood Estimation with Incomplete Array Variate Observations

Missing data is an important challenge when dealing with high dimensiona...
research
11/08/2021

Sequence Reconstruction Problem for Deletion Channels: A Complete Asymptotic Solution

Transmit a codeword x, that belongs to an (ℓ-1)-deletion-correcting code...
research
03/24/2021

Envelope Methods with Ignorable Missing Data

Envelope method was recently proposed as a method to reduce the dimensio...
research
03/28/2015

Sparse Linear Regression With Missing Data

This paper proposes a fast and accurate method for sparse regression in ...
research
06/26/2019

Preliminary test estimation in ULAN models

Preliminary test estimation, which is a natural procedure when it is sus...
research
05/25/2017

Fast Causal Inference with Non-Random Missingness by Test-Wise Deletion

Many real datasets contain values missing not at random (MNAR). In this ...

Please sign up or login with your details

Forgot password? Click here to reset