A General Framework of Nonparametric Feature Selection in High-Dimensional Data
Nonparametric feature selection in high-dimensional data is an important and challenging problem in statistics and machine learning fields. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. In this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel which depends on a set of parameters that determine the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters at the same time. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove both the oracle selection property and the Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and application to a microarray study of eye disease in animals.
READ FULL TEXT