A review and recommendations on variable selection methods in regression models for binary data

01/16/2022
by   Souvik Bag, et al.
0

The selection of essential variables in logistic regression is vital because of its extensive use in medical studies, finance, economics and related fields. In this paper, we explore four main typologies (test-based, penalty-based, screening-based, and tree-based) of frequentist variable selection methods in logistic regression setup. Primary objective of this work is to give a comprehensive overview of the existing literature for practitioners. Underlying assumptions and theory, along with the specifics of their implementations, are detailed as well. Next, we conduct a thorough simulation study to explore the performances of fifteen different methods in terms of variable selection, estimation of coefficients, prediction accuracy as well as time complexity under various settings. We take low, moderate and high dimensional setups and consider different correlation structures for the covariates. A real-life application, using a high-dimensional gene expression data, is also included in this study to further understand the efficacy and consistency of the methods. Finally, based on our findings in the simulated data and in the real data, we provide recommendations for practitioners on the choice of variable selection methods under various contexts.

READ FULL TEXT
research
12/16/2022

The CDF penalty:sparse and quasi unbiased estimation in regression models

In high-dimensional regression modelling, the number of candidate covari...
research
06/08/2023

Comprehensive Stepwise Selection for Logistic Regression

Automated variable selection is widely applied in statistical model deve...
research
02/17/2021

Split Modeling for High-Dimensional Logistic Regression

A novel method is proposed to learn an ensemble of logistic classificati...
research
07/01/2019

State-of-the-art in selection of variables and functional forms in multivariable analysis -- outstanding issues

How to select variables and identify functional forms for continuous var...
research
07/25/2018

Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

Background: Choosing the most performing method in terms of outcome pred...
research
04/22/2016

Developing an ICU scoring system with interaction terms using a genetic algorithm

ICU mortality scoring systems attempt to predict patient mortality using...
research
03/27/2022

Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems

Modal split prediction in transportation networks has the potential to s...

Please sign up or login with your details

Forgot password? Click here to reset