Scalable Kernel Methods via Doubly Stochastic Gradients

07/21/2014
by   Bo Dai, et al.
0

The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization performance of O(1/√(t)). This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.

READ FULL TEXT
research
10/04/2010

Asymptotic Normality of Support Vector Machine Variants and Other Regularized Kernel Methods

In nonparametric classification and regression problems, regularized ker...
research
02/22/2016

Preconditioning Kernel Matrices

The computational and storage complexity of kernel machines presents the...
research
04/01/2021

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces

The goal of nonparametric regression is to recover an underlying regress...
research
11/14/2014

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

The computational complexity of kernel methods has often been a major ba...
research
06/12/2020

Kernel Distributionally Robust Optimization

This paper is an in-depth investigation of using kernel methods to immun...
research
10/21/2019

Kernelized Wasserstein Natural Gradient

Many machine learning problems can be expressed as the optimization of s...
research
03/11/2015

Stochastic Texture Difference for Scale-Dependent Data Analysis

This article introduces the Stochastic Texture Difference method for ana...

Please sign up or login with your details

Forgot password? Click here to reset