Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

10/20/2022
by   Yong-Shiuan Lee, et al.
0

Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or bootstrapped MI data may not be worthy in terms of computation cost. To fast identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on MI data. The proposed methods proceed iteratively, which starts from finding the active variables based on the complete case subset and then expand the working data matrix with both the number of active variables and available observations. A comprehensive simulation study shows the selection accuracy in different aspects and computational efficiency of the proposed methods. Two real-life examples illustrate the strength of the proposed methods.

READ FULL TEXT

page 23

page 24

page 25

page 26

research
10/11/2017

Variable Selection in Restricted Linear Regression Models

The use of prior information in the linear regression is well known to p...
research
07/07/2017

Exhaustive search for sparse variable selection in linear regression

We propose a K-sparse exhaustive search (ES-K) method and a K-sparse app...
research
06/03/2022

Estimation and variable selection in joint mean and dispersion models applied to mixture experiments

In industrial experiments, controlling variability is of paramount impor...
research
07/26/2022

An exhaustive variable selection study for linear models of soundscape emotions: rankings and Gibbs analysis

In the last decade, soundscapes have become one of the most active topic...
research
05/26/2022

Variable Selection for Individualized Treatment Rules with Discrete Outcomes

An individualized treatment rule (ITR) is a decision rule that aims to i...
research
11/10/2021

variable selection and missing data imputation in categorical genomic data analysis by integrated ridge regression and random forest

Genomic data arising from a genome-wide association study (GWAS) are oft...
research
09/17/2018

Spatial Variable Selection and An Application to Virginia Lyme Disease Emergence

Lyme disease is an infectious disease that is caused by a bacterium call...

Please sign up or login with your details

Forgot password? Click here to reset