A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

05/01/2012
by   Arash Afkanpour, et al.
0

We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels d to be combined is very large, multiple kernel learning methods whose computational cost scales linearly in d are intractable. We propose a randomized version of the mirror descent algorithm to overcome this issue, under the objective of minimizing the group p-norm penalized empirical risk. The key to achieve the required exponential speed-up is the computationally efficient construction of low-variance estimates of the gradient. We propose importance sampling based estimates, and find that the ideal distribution samples a coordinate with a probability proportional to the magnitude of the corresponding gradient. We show the surprising result that in the case of learning the coefficients of a polynomial kernel, the combinatorial structure of the base kernels to be combined allows the implementation of sampling from this distribution to run in O((d)) time, making the total computational cost of the method to achieve an ϵ-optimal solution to be O((d)/ϵ^2), thereby allowing our method to operate for very large values of d. Experiments with simulated and real data confirm that the new algorithm is computationally more efficient than its state-of-the-art alternatives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2018

Statistically and Computationally Efficient Variance Estimator for Kernel Ridge Regression

In this paper, we propose a random projection approach to estimate varia...
research
12/20/2011

Alignment Based Kernel Learning with a Continuous Set of Base Kernels

The success of kernel-based learning methods depend on the choice of ker...
research
09/28/2009

SpicyMKL

We propose a new optimization algorithm for Multiple Kernel Learning (MK...
research
07/26/2023

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

Wide neural networks are biased towards learning certain functions, infl...
research
05/26/2021

Submodular Kernels for Efficient Rankings

Many algorithms for ranked data become computationally intractable as th...
research
02/01/2013

Sparse Multiple Kernel Learning with Geometric Convergence Rate

In this paper, we study the problem of sparse multiple kernel learning (...
research
10/12/2017

New efficient algorithms for multiple change-point detection with kernels

Several statistical approaches based on reproducing kernels have been pr...

Please sign up or login with your details

Forgot password? Click here to reset