A Model-free Variable Screening Method Based on Leverage Score

09/21/2021
by   Wenxuan Zhong, et al.
0

With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singular value decompositions has been proposed to select rows of a design matrix as a surrogate of the full data in linear regression. Analogously, variable screening can be viewed as selecting rows of the design matrix. However, effective variable selection along this line of thinking remains elusive. In this article, we bridge this gap to propose a weighted leverage variable screening method by utilizing both the left and right singular vectors of the design matrix. We show theoretically and empirically that the predictors selected using our method can consistently include true predictors not only for linear models but also for complicated general index models. Extensive simulation studies show that the weighted leverage screening method is highly computationally efficient and effective. We also demonstrate its success in identifying carcinoma related genes using spatial transcriptome data.

READ FULL TEXT
research
06/09/2023

Variable screening using factor analysis for high-dimensional data with multicollinearity

Screening methods are useful tools for variable selection in regression ...
research
12/30/2017

An ISIS screening approach involving threshold/partition for variable selection in linear regression

In linear regression, one can select a predictor if the absolute sample ...
research
05/17/2011

Independent screening for single-index hazard rate models with ultra-high dimensional features

In data sets with many more features than observations, independent scre...
research
08/10/2020

A note of feature screening via rank-based coefficient of correlation

Feature screening is useful and popular to detect informative predictors...
research
09/17/2015

Optimal Subsampling Approaches for Large Sample Linear Regression

A significant hurdle for analyzing large sample data is the lack of effe...
research
05/17/2018

Covariance-Insured Screening

Modern bio-technologies have produced a vast amount of high-throughput d...
research
11/22/2010

Variational approximation for heteroscedastic linear models and matching pursuit algorithms

Modern statistical applications involving large data sets have focused a...

Please sign up or login with your details

Forgot password? Click here to reset