Solving Kernel Ridge Regression with Gradient-Based Optimization Methods

06/29/2023
by   Oskar Allerbo, et al.
0

Kernel ridge regression, KRR, is a non-linear generalization of linear ridge regression. Here, we introduce an equivalent formulation of the objective function of KRR, opening up both for using other penalties than the ridge penalty and for studying kernel ridge regression from the perspective of gradient descent. Using a continuous-time perspective, we derive a closed-form solution, kernel gradient flow, KGF, with regularization through early stopping, which allows us to theoretically bound the differences between KGF and KRR. We generalize KRR by replacing the ridge penalty with the ℓ_1 and ℓ_∞ penalties and utilize the fact that analogously to the similarities between KGF and KRR, the solutions obtained when using these penalties are very similar to those obtained from forward stagewise regression (also known as coordinate descent) and sign gradient descent in combination with early stopping. Thus the need for computationally heavy proximal gradient descent algorithms can be alleviated. We show theoretically and empirically how these penalties, and corresponding gradient-based optimization algorithms, produce signal-driven and robust regression solutions, respectively. We also investigate kernel gradient descent where the kernel is allowed to change during training, and theoretically address the effects this has on generalization. Based on our findings, we propose an update scheme for the bandwidth of translational-invariant kernels, where we let the bandwidth decrease to zero during training, thus circumventing the need for hyper-parameter selection. We demonstrate on real and synthetic data how decreasing the bandwidth during training outperforms using a constant bandwidth, selected by cross-validation and marginal likelihood maximization. We also show that using a decreasing bandwidth, we are able to achieve both zero training error and a double descent behavior.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2022

The Implicit Regularization of Momentum Gradient Descent with Early Stopping

The study on the implicit regularization induced by gradient-based optim...
research
05/24/2022

Bandwidth Selection for Gaussian Kernel Ridge Regression via Jacobian Control

Most machine learning methods depend on the tuning of hyper-parameters. ...
research
10/23/2018

A Continuous-Time View of Early Stopping for Least Squares Regression

We study the statistical properties of the iterates generated by gradien...
research
02/04/2022

Elastic Gradient Descent and Elastic Gradient Flow: LARS Like Algorithms Approximating the Solution Paths of the Elastic Net

The elastic net combines lasso and ridge regression to fuse the sparsity...
research
02/28/2022

On the Benefits of Large Learning Rates for Kernel Methods

This paper studies an intriguing phenomenon related to the good generali...
research
06/28/2022

Target alignment in truncated kernel ridge regression

Kernel ridge regression (KRR) has recently attracted renewed interest du...
research
03/28/2017

Gradient-based Regularization Parameter Selection for Problems with Non-smooth Penalty Functions

In high-dimensional and/or non-parametric regression problems, regulariz...

Please sign up or login with your details

Forgot password? Click here to reset