Learning with Gradient Descent and Weakly Convex Losses

01/13/2021
by   Dominic Richards, et al.
0

We study the learning performance of gradient descent when the empirical risk is weakly convex, namely, the smallest negative eigenvalue of the empirical risk's Hessian is bounded in magnitude. By showing that this eigenvalue can control the stability of gradient descent, generalisation error bounds are proven that hold under a wider range of step sizes compared to previous work. Out of sample guarantees are then achieved by decomposing the test error into generalisation, optimisation and approximation errors, each of which can be bounded and traded off with respect to algorithmic parameters, sample size and magnitude of this eigenvalue. In the case of a two layer neural network, we demonstrate that the empirical risk can satisfy a notion of local weak convexity, specifically, the Hessian's smallest eigenvalue during training can be controlled by the normalisation of the layers, i.e., network scaling. This allows test error guarantees to then be achieved when the population risk minimiser satisfies a complexity assumption. By trading off the network complexity and scaling, insights are gained into the implicit bias of neural network scaling, which are further supported by experimental findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Traditional analyses of gradient descent show that when the largest eige...
research
05/28/2018

Understanding Generalization and Optimization Performance of Deep CNNs

This work aims to provide understandings on the remarkable success of de...
research
05/26/2023

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

Recently, significant progress has been made in understanding the genera...
research
05/21/2020

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

We prove that the gradient descent training of a two-layer neural networ...
research
06/01/2020

Least-squares regressions via randomized Hessians

We consider the least-squares regression problem with a finite number of...
research
07/27/2021

Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

We revisit on-average algorithmic stability of Gradient Descent (GD) for...
research
02/18/2023

Generalization and Stability of Interpolating Neural Networks with Minimal Width

We investigate the generalization and optimization of k-homogeneous shal...

Please sign up or login with your details

Forgot password? Click here to reset