Learning threshold neurons via the "edge of stability"

12/14/2022
by   Kwangjun Ahn, et al.
0

Existing analyses of neural network training often operate under the unrealistic assumption of an extremely small learning rate. This lies in stark contrast to practical wisdom and empirical studies, such as the work of J. Cohen et al. (ICLR 2021), which exhibit startling new phenomena (the "edge of stability" or "unstable convergence") and potential benefits for generalization in the large learning rate regime. Despite a flurry of recent works on this topic, however, the latter effect is still poorly understood. In this paper, we take a step towards understanding genuinely non-convex training dynamics with large learning rates by performing a detailed analysis of gradient descent for simplified models of two-layer neural networks. For these models, we provably establish the edge of stability phenomenon and discover a sharp phase transition for the step size below which the neural network fails to learn "threshold-like" neurons (i.e., neurons with a non-zero first-layer bias). This elucidates one possible mechanism by which the edge of stability can in fact lead to better generalization, as threshold neurons are basic building blocks with useful inductive bias for many tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2020

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

Learning rate schedule can significantly affect generalization performan...
research
06/10/2022

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...
research
03/04/2020

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the pe...
research
06/08/2022

On Gradient Descent Convergence beyond the Edge of Stability

Gradient Descent (GD) is a powerful workhorse of modern machine learning...
research
07/09/2018

Learning Functions in Large Networks requires Modularity and produces Multi-Agent Dynamics

Networks are abundant in biological systems. Small sized over-represente...
research
05/19/2022

Understanding Gradient Descent on Edge of Stability in Deep Learning

Deep learning experiments in Cohen et al. (2021) using deterministic Gra...
research
02/18/2023

Generalization and Stability of Interpolating Neural Networks with Minimal Width

We investigate the generalization and optimization of k-homogeneous shal...

Please sign up or login with your details

Forgot password? Click here to reset