Robust distance correlation for variable screening

12/26/2022
by   Tianzhou Ma, et al.
0

High-dimensional data are commonly seen in modern statistical applications, variable selection methods play indispensable roles in identifying the critical features for scientific discoveries. Traditional best subset selection methods are computationally intractable with a large number of features, while regularization methods such as Lasso, SCAD and their variants perform poorly in ultrahigh-dimensional data due to low computational efficiency and unstable algorithm. Sure screening methods have become popular alternatives by first rapidly reducing the dimension using simple measures such as marginal correlation then applying any regularization methods. A number of screening methods for different models or problems have been developed, however, none of the methods have targeted at data with heavy tailedness, which is another important characteristics of modern big data. In this paper, we propose a robust distance correlation (“RDC”) based sure screening method to perform screening in ultrahigh-dimensional regression with heavy-tailed data. The proposed method shares the same good properties as the original model-free distance correlation based screening while has additional merit of robustly estimating the distance correlation when data is heavy-tailed and improves the model selection performance in screening. We conducted extensive simulations under different scenarios of heavy tailedness to demonstrate the advantage of our proposed procedure as compared to other existing model-based or model-free screening procedures with improved feature selection and prediction performance. We also applied the method to high-dimensional heavy-tailed RNA sequencing (RNA-seq) data of The Cancer Genome Atlas (TCGA) pancreatic cancer cohort and RDC was shown to outperform the other methods in prioritizing the most essential and biologically meaningful genes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2019

Model-free Feature Screening with Projection Correlation and FDR Control with Knockoff Features

This paper proposes a model-free and data-adaptive feature screening met...
research
05/09/2023

Robust Model Selection with Application in Single-Cell Multiomics Data

Model selection is critical in the modern statistics and machine learnin...
research
06/11/2013

DISCOMAX: A Proximity-Preserving Distance Correlation Maximization Algorithm

In a regression setting we propose algorithms that reduce the dimensiona...
research
08/19/2019

Model-free Feature Screening and FDR Control with Knockoff Features

This paper proposes a model-free and data-adaptive feature screening met...
research
10/11/2017

Variable screening with multiple studies

Advancement in technology has generated abundant high-dimensional data t...
research
07/24/2021

A Robust Partial Correlation-based Screening Approach

As a computationally fast and working efficient tool, sure independence ...
research
02/05/2018

Copula-based Partial Correlation Screening: a Joint and Robust Approach

Screening for ultrahigh dimensional features may encounter complicated i...

Please sign up or login with your details

Forgot password? Click here to reset