Taming Nonconvexity in Kernel Feature Selection—Favorable Properties of the Laplace Kernel

by   Feng Ruan, et al.

Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the global optima, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other ℓ_1 kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other ℓ_2 kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that ℓ_1 kernels eliminate unfavorable stationary points that appear when using an ℓ_2 kernel. Armed with this insight, we establish statistical guarantees for ℓ_1 kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of ℓ_1-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with n ∼log p samples.


page 1

page 2

page 3

page 4


A Self-Penalizing Objective Function for Scalable Interaction Detection

We tackle the problem of nonparametric variable selection with a focus o...

On the Self-Penalization Phenomenon in Feature Selection

We describe an implicit sparsity-inducing mechanism based on minimizatio...

Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring

We consider the two-group classification problem and propose a kernel cl...

Feature Selection for Value Function Approximation Using Bayesian Model Selection

Feature selection in reinforcement learning (RL), i.e. choosing basis fu...

Efficient Sparse Group Feature Selection via Nonconvex Optimization

Sparse feature selection has been demonstrated to be effective in handli...

Learning from DPPs via Sampling: Beyond HKPV and symmetry

Determinantal point processes (DPPs) have become a significant tool for ...

Variational Autoencoder Kernel Interpretation and Selection for Classification

This work proposed kernel selection approaches for probabilistic classif...

Please sign up or login with your details

Forgot password? Click here to reset