Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

07/26/2023
by   Amnon Geifman, et al.
0

Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

On the Inductive Bias of Neural Tangent Kernels

State-of-the-art neural networks are heavily over-parameterized, making ...
research
07/07/2020

Gradient Descent Converges to Ridgelet Spectrum

Deep learning achieves a high generalization performance in practice, de...
research
03/01/2021

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differe...
research
07/03/2020

On the Similarity between the Laplace and Neural Tangent Kernels

Recent theoretical work has shown that massively overparameterized neura...
research
05/01/2012

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning

We consider the problem of simultaneously learning to linearly combine a...
research
11/27/2022

A Kernel Perspective of Skip Connections in Convolutional Networks

Over-parameterized residual networks (ResNets) are amongst the most succ...
research
02/22/2020

On the Inductive Bias of a CNN for Orthogonal Patterns Distributions

Training overparameterized convolutional neural networks with gradient b...

Please sign up or login with your details

Forgot password? Click here to reset