Kernel Alignment Risk Estimator: Risk Prediction from Training Data

06/17/2020
by   Arthur Jacot, et al.
9

We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel K with ridge λ>0 and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT ϑ_K,λ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE ρ_K, λ, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.

READ FULL TEXT
research
04/19/2019

Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data

This paper carries out a large dimensional analysis of a variation of ke...
research
09/17/2018

Statistically and Computationally Efficient Variance Estimator for Kernel Ridge Regression

In this paper, we propose a random projection approach to estimate varia...
research
02/19/2020

Implicit Regularization of Random Feature Models

Random Feature (RF) models are used as efficient parametric approximatio...
research
09/04/2022

On Kernel Regression with Data-Dependent Kernels

The primary hyperparameter in kernel regression (KR) is the choice of ke...
research
02/07/2022

Failure and success of the spectral bias prediction for Kernel Ridge Regression: the case of low-dimensional data

Recently, several theories including the replica method made predictions...
research
02/08/2022

Distribution Regression with Sliced Wasserstein Kernels

The problem of learning functions over spaces of probabilities - or dist...
research
05/23/2023

On the Size and Approximation Error of Distilled Sets

Dataset Distillation is the task of synthesizing small datasets from lar...

Please sign up or login with your details

Forgot password? Click here to reset