KOO approach for scalable variable selection problem in large-dimensional regression

03/30/2023
by   Zhidong Bai, et al.
0

An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results guarantee the strong consistency of a subset selection rule based on the KOO statistics with a general threshold. For enhancing the robustness of the selection rule, we also propose a bootstrap threshold for the KOO approach. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C_p threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2017

An ISIS screening approach involving threshold/partition for variable selection in linear regression

In linear regression, one can select a predictor if the absolute sample ...
research
02/17/2022

Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression

In traditional logistic regression models, the link function is often as...
research
11/15/2021

An Approach of Bayesian Variable Selection for Ultrahigh Dimensional Multivariate Regression

In many practices, scientists are particularly interested in detecting w...
research
03/06/2019

Economic variable selection

Regression plays a key role in many research areas and its variable sele...
research
09/08/2020

Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression

Given m d-dimensional responsors and n d-dimensional predictors, sparse ...
research
08/11/2021

Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles

In this paper, we extend our PrInDT method (Weihs Buschfeld 2021a) t...
research
07/20/2021

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Prior work has shown that combining bootstrap imputation with tree-based...

Please sign up or login with your details

Forgot password? Click here to reset