On the consistency theory of high dimensional variable screening

02/24/2015
by   Xiangyu Wang, et al.
0

Variable screening is a fast dimension reduction technique for assisting high dimensional feature selection. As a preselection method, it selects a moderate size subset of candidate variables for further refining via feature selection to produce the final model. The performance of variable screening depends on both computational efficiency and the ability to dramatically reduce the number of variables without discarding the important ones. When the data dimension p is substantially larger than the sample size n, variable screening becomes crucial as 1) Faster feature selection algorithms are needed; 2) Conditions guaranteeing selection consistency might fail to hold. This article studies a class of linear screening methods and establishes consistency theory for this special class. In particular, we prove the restricted diagonally dominant (RDD) condition is a necessary and sufficient condition for strong screening consistency. As concrete examples, we show two screening methods SIS and HOLP are both strong screening consistent (subject to additional constraints) with large probability if n > O((ρ s + σ/τ)^2 p) under random designs. In addition, we relate the RDD condition to the irrepresentable condition, and highlight limitations of SIS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2021

Bayesian iterative screening in ultra-high dimensional settings

Variable selection in ultra-high dimensional linear regression is often ...
research
04/04/2022

Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks

The applications of traditional statistical feature selection methods to...
research
08/21/2017

ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models

Statistical inference can be computationally prohibitive in ultrahigh-di...
research
03/16/2023

A Multimodal Data-driven Framework for Anxiety Screening

Early screening for anxiety and appropriate interventions are essential ...
research
04/07/2021

Online Feature Screening for Data Streams with Concept Drift

Screening feature selection methods are often used as a preprocessing st...
research
05/30/2012

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Over the past decades, statisticians and machine-learning researchers ha...
research
02/01/2020

On the Consistency of Optimal Bayesian Feature Selection in the Presence of Correlations

Optimal Bayesian feature selection (OBFS) is a multivariate supervised s...

Please sign up or login with your details

Forgot password? Click here to reset