Scalable kernel-based variable selection with sparsistency

02/26/2018
by   Xin He, et al.
0

Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2021

A gradient-based variable selection for binary classification in reproducing kernel Hilbert space

Variable selection is essential in high-dimensional data analysis. Altho...
research
01/03/2019

Sparse Learning in reproducing kernel Hilbert space

Sparse learning aims to learn the sparse structure of the true target fu...
research
06/02/2018

Variable Selection for Nonparametric Learning with Power Series Kernels

In this paper, we propose a variable selection method for general nonpar...
research
06/23/2021

The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time

Many scientific problems require identifying a small set of covariates t...
research
06/05/2018

Selection and Estimation Optimality in High Dimensions with the TWIN Penalty

We introduce a novel class of variable selection penalties called TWIN, ...
research
12/06/2007

Kernels and Ensembles: Perspectives on Statistical Learning

Since their emergence in the 1990's, the support vector machine and the ...
research
04/22/2016

An improved chromosome formulation for genetic algorithms applied to variable selection with the inclusion of interaction terms

Genetic algorithms are a well-known method for tackling the problem of v...

Please sign up or login with your details

Forgot password? Click here to reset