Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

09/09/2008
by   Francis Bach, et al.
0

For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2023

The Gaussian kernel on the circle and spaces that admit isometric embeddings of the circle

On Euclidean spaces, the Gaussian kernel is one of the most widely used ...
research
01/23/2011

Reproducing Kernel Banach Spaces with the l1 Norm

Targeting at sparse learning, we construct Banach spaces B of functions ...
research
10/12/2021

On the Self-Penalization Phenomenon in Feature Selection

We describe an implicit sparsity-inducing mechanism based on minimizatio...
research
03/30/2021

Minimum complexity interpolation in random features models

Despite their many appealing properties, kernel methods are heavily affe...
research
05/14/2023

Conditional mean embeddings and optimal feature selection via positive definite kernels

Motivated by applications, we consider here new operator theoretic appro...
research
11/02/2018

An L1 Representer Theorem for Multiple-Kernel Regression

The theory of RKHS provides an elegant framework for supervised learning...
research
06/29/2018

Learning from graphs with structural variation

We study the effect of structural variation in graph data on the predict...

Please sign up or login with your details

Forgot password? Click here to reset