Strong Sure Screening of Ultra-high Dimensional Categorical Data

01/10/2018
by   Randall Reese, et al.
0

Feature screening for ultra high dimensional feature spaces plays a critical role in the analysis of data sets whose predictors exponentially exceed the number of observations. Such data sets are becoming increasingly prevalent in areas such as bioinformatics, medical imaging, and social network analysis. Frequently, these data sets have both categorical response and categorical covariates, yet extant feature screening literature rarely considers such data types. We propose a new screening procedure rooted in the Cochran-Armitage trend test. Our method is specifically applicable for data where both the response and predictors are categorical. Under a set of reasonable conditions, we demonstrate that our screening procedure has the strong sure screening property, which extends the seminal results of Fan and Lv. A series of four simulations are used to investigate the performance of our method relative to three other screening methods. We also apply a two-stage iterative approach to a real data example by first employing our proposed method, and then further screening a subset of selected covariates using lasso, adaptive-lasso and elastic net regularization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2018

Strong Sure Screening of Ultra-high Dimensional Data with Interaction Effects

Ultrahigh dimensional data sets are becoming increasingly prevalent in a...
research
11/16/2019

Marginal and Interactive Feature Screening of Ultra-high Dimensional Feature Spaces with Multivariate Response

When the number of features exponentially outnumbers the number of sampl...
research
04/27/2017

Efficient Feature Screening for Lasso-Type Problems via Hybrid Safe-Strong Rules

The lasso model has been widely used for model selection in data mining,...
research
06/23/2022

High-dimensional Variable Screening via Conditional Martingale Difference Divergence

Variable screening has been a useful research area that helps to deal wi...
research
01/20/2017

The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R

Penalized regression models such as the lasso have been extensively appl...
research
05/17/2018

Covariance-Insured Screening

Modern bio-technologies have produced a vast amount of high-throughput d...
research
05/08/2022

On Exact Feature Screening in Ultrahigh-dimensional Binary Classification

We propose a new model-free feature screening method based on energy dis...

Please sign up or login with your details

Forgot password? Click here to reset