Approximation beats concentration? An approximation view on inference with smooth radial kernels

01/10/2018
by   Mikhail Belkin, et al.
0

Positive definite kernels and their associated Reproducing Kernel Hilbert Spaces provide a mathematically compelling and practically competitive framework for learning from data. In this paper we take the approximation theory point of view to explore various aspects of smooth kernels related to their inferential properties. We analyze eigenvalue decay of kernels operators and matrices, properties of eigenfunctions/eigenvectors and "Fourier" coefficients of functions in the kernel space restricted to a discrete set of data points. We also investigate the fitting capacity of kernels, giving explicit bounds on the fat shattering dimension of the balls in Reproducing Kernel Hilbert spaces. Interestingly, the same properties that make kernels very effective approximators for functions in their "native" kernel space, also limit their capacity to represent arbitrary functions. We discuss various implications, including those for gradient descent type methods. It is important to note that most of our bounds are measure independent. Moreover, at least in moderate dimension, the bounds for eigenvalues are much tighter than the bounds which can be obtained from the usual matrix concentration results. For example, we see that the eigenvalues of kernel matrices show nearly exponential decay with constants depending only on the kernel and the domain. We call this "approximation beats concentration" phenomenon as even when the data are sampled from a probability distribution, some of their aspects are better understood in terms of approximation theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2009

Positive Definite Kernels in Machine Learning

This survey is an introduction to positive definite kernels and the set ...
research
05/04/2020

Lecture notes: Efficient approximation of kernel functions

These lecture notes endeavour to collect in one place the mathematical b...
research
10/27/2018

Stein Variational Gradient Descent as Moment Matching

Stein variational gradient descent (SVGD) is a non-parametric inference ...
research
12/05/2018

Relative concentration bounds for the kernel matrix spectrum

In this paper, we study the concentration properties of the kernel matri...
research
10/30/2019

Spectral properties of kernel matrices in the flat limit

Kernel matrices are of central importance to many applied fields. In thi...
research
12/02/2021

The Representation Jensen-Rényi Divergence

We introduce a divergence measure between data distributions based on op...
research
11/11/2012

Measures of Entropy from Data Using Infinitely Divisible Kernels

Information theory provides principled ways to analyze different inferen...

Please sign up or login with your details

Forgot password? Click here to reset