Distributed Feature Screening via Componentwise Debiasing

03/09/2019
by   Xingxiang Li, et al.
0

Feature screening is a powerful tool in the analysis of high dimensional data. When the sample size N and the number of features p are both large, the implementation of classic screening methods can be numerically challenging. In this paper, we propose a distributed screening framework for big data setup. In the spirit of "divide-and-conquer", the proposed framework expresses a correlation measure as a function of several component parameters, each of which can be distributively estimated using a natural U-statistic from data segments. With the component estimates aggregated, we obtain a final correlation estimate that can be readily used for screening features. This framework enables distributed storage and parallel computing and thus is computationally attractive. Due to the unbiased distributive estimation of the component parameters, the final aggregated estimate achieves a high accuracy that is insensitive to the number of data segments m specified by the problem itself or to be chosen by users. Under mild conditions, we show that the aggregated correlation estimator is as efficient as the classic centralized estimator in terms of the probability convergence bound; the corresponding screening procedure enjoys sure screening property for a wide range of correlation measures. The promising performances of the new method are supported by extensive numerical examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2022

Sure Screening for Transelliptical Graphical Models

We propose a sure screening approach for recovering the structure of a t...
research
05/11/2016

Interaction pursuit in high-dimensional multi-response regression via distance correlation

Feature interactions can contribute to a large proportion of variation i...
research
05/05/2015

On the Feasibility of Distributed Kernel Regression for Big Data

In modern scientific research, massive datasets with huge numbers of obs...
research
10/11/2017

Variable screening with multiple studies

Advancement in technology has generated abundant high-dimensional data t...
research
01/10/2022

SMLE: An R Package for Joint Feature Screening in Ultrahigh-dimensional GLMs

The sparsity-restricted maximum likelihood estimator (SMLE) has received...
research
11/04/2016

Classification with Ultrahigh-Dimensional Features

Although much progress has been made in classification with high-dimensi...
research
05/26/2022

Factor selection in screening experiments by aggregation over random models

Screening experiments are useful for screening out a small number of tru...

Please sign up or login with your details

Forgot password? Click here to reset