Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

07/27/2021
by   Dominic Richards, et al.
0

We revisit on-average algorithmic stability of Gradient Descent (GD) for training overparameterised shallow neural networks and prove new generalisation and excess risk bounds without the Neural Tangent Kernel (NTK) or Polyak-Łojasiewicz (PL) assumptions. In particular, we show oracle type bounds which reveal that the generalisation and excess risk of GD is controlled by an interpolating network with the shortest GD path from initialisation (in a sense, an interpolating network with the smallest relative norm). While this was known for kernelised interpolants, our proof applies directly to networks trained by GD without intermediate kernelisation. At the same time, by relaxing oracle inequalities developed here we recover existing NTK-based risk bounds in a straightforward way, which demonstrates that our analysis is tighter. Finally, unlike most of the NTK-based analyses we focus on regression with label noise and show that GD with early stopping is consistent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2021

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

We explore the ability of overparameterized shallow neural networks to l...
research
12/28/2022

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

We explore the ability of overparameterized shallow ReLU neural networks...
research
09/14/2023

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

We analyze the generalization properties of two-layer neural networks in...
research
09/19/2022

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

We bound the excess risk of interpolating deep linear networks trained u...
research
06/10/2021

Early-stopped neural networks are consistent

This work studies the behavior of neural networks trained with the logis...
research
12/06/2020

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Establishing a theoretical analysis that explains why deep learning can ...
research
01/13/2021

Learning with Gradient Descent and Weakly Convex Losses

We study the learning performance of gradient descent when the empirical...

Please sign up or login with your details

Forgot password? Click here to reset