Universal scaling laws in the gradient descent training of neural networks

05/02/2021
by   Maksim Velikanov, et al.
0

Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show that the learning trajectory can be characterized by an explicit asymptotic at large training times. Specifically, the leading term in the asymptotic expansion of the loss behaves as a power law L(t) ∼ t^-ξ with exponent ξ expressed only through the data dimension, the smoothness of the activation function, and the class of function being approximated. Our results are based on spectral analysis of the integral operator representing the linearized evolution of a large network trained on the expected loss. Importantly, the techniques we employ do not require specific form of a data distribution, for example Gaussian, thus making our findings sufficiently universal.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel

Stochastic Gradient Descent (SGD) is widely used to train deep neural ne...
research
02/02/2022

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

Performance of optimization on quadratic problems sensitively depends on...
research
05/25/2019

Hebbian-Descent

In this work we propose Hebbian-descent as a biologically plausible lear...
research
05/07/2018

Polynomial Convergence of Gradient Descent for Training One-Hidden-Layer Neural Networks

We analyze Gradient Descent applied to learning a bounded target functio...
research
07/14/2020

Plateau Phenomenon in Gradient Descent Training of ReLU networks: Explanation, Quantification and Avoidance

The ability of neural networks to provide `best in class' approximation ...
research
07/04/2022

Automating the Design and Development of Gradient Descent Trained Expert System Networks

Prior work introduced a gradient descent trained expert system that conc...

Please sign up or login with your details

Forgot password? Click here to reset