Gradient Descent in Neural Networks as Sequential Learning in RKBS

02/01/2023
by   Alistair Shilton, et al.
0

The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing neural network training to be analyzed from the perspective of reproducing kernel Banach space (RKBS). We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning in RKBS. Using this, we present novel bound on uniform convergence where the iterations count and learning rate play a central role, giving new theoretical insight into neural network training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2020

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

We propose Taylorized training as an initiative towards better understan...
research
01/21/2019

Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits

Consider the problem: given data pair (x, y) drawn from a population wit...
research
04/10/2019

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

The behavior of the gradient descent (GD) algorithm is analyzed for a de...
research
09/14/2023

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

We analyze the generalization properties of two-layer neural networks in...
research
05/21/2020

Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels

We establish a scale separation of Kolmogorov width type between subspac...
research
02/22/2021

Approximation of dilation-based spatial relations to add structural constraints in neural networks

Spatial relations between objects in an image have proved useful for str...

Please sign up or login with your details

Forgot password? Click here to reset