Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique

08/01/2023
by   Junxian Zhu, et al.
0

In high-dimensional generalized linear models, it is crucial to identify a sparse model that adequately accounts for response variation. Although the best subset section has been widely regarded as the Holy Grail of problems of this type, achieving either computational efficiency or statistical guarantees is challenging. In this article, we intend to surmount this obstacle by utilizing a fast algorithm to select the best subset with high certainty. We proposed and illustrated an algorithm for best subset recovery in regularity conditions. Under mild conditions, the computational complexity of our algorithm scales polynomially with sample size and dimension. In addition to demonstrating the statistical properties of our method, extensive numerical experiments reveal that it outperforms existing methods for variable selection and coefficient estimation. The runtime analysis shows that our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits like glmnet and ncvreg.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2023

Empirical Bayes inference in sparse high-dimensional generalized linear models

High-dimensional linear models have been extensively studied in the rece...
research
02/08/2016

DECOrrelated feature space partitioning for distributed sparse regression

Fitting statistical models is computationally challenging when the sampl...
research
07/16/2019

Variable selection in sparse high-dimensional GLARMA models

In this paper, we propose a novel variable selection approach in the fra...
research
04/23/2021

Certifiably Polynomial Algorithm for Best Group Subset Selection

Best group subset selection aims to choose a small part of non-overlappi...
research
03/19/2020

Semi-analytic approximate stability selection for correlated data in generalized linear models

We consider the variable selection problem of generalized linear models ...
research
12/05/2011

On best subset regression

In this paper we discuss the variable selection method from ℓ0-norm cons...
research
10/24/2014

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distribu...

Please sign up or login with your details

Forgot password? Click here to reset