No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

06/20/2023
by   Charles Guille-Escuret, et al.
0

Understanding the optimization dynamics of neural networks is necessary for closing the gap between theory and practice. Stochastic first-order optimization algorithms are known to efficiently locate favorable minima in deep neural networks. This efficiency, however, contrasts with the non-convex and seemingly complex structure of neural loss landscapes. In this study, we delve into the fundamental geometric properties of sampled gradients along optimization paths. We focus on two key quantities, which appear in the restricted secant inequality and error bound. Both hold high significance for first-order optimization. Our analysis reveals that these quantities exhibit predictable, consistent behavior throughout training, despite the stochasticity induced by sampling minibatches. Our findings suggest that not only do optimization trajectories never encounter significant obstacles, but they also maintain stable dynamics during the majority of training. These observed properties are sufficiently expressive to theoretically guarantee linear convergence and prescribe learning rate schedules mirroring empirical practices. We conduct our experiments on image classification, semantic segmentation and language modeling across different batch sizes, network architectures, datasets, optimizers, and initialization seeds. We discuss the impact of each factor. Our work provides novel insights into the properties of neural network loss functions, and opens the door to theoretical frameworks more relevant to prevalent practice.

READ FULL TEXT

page 2

page 19

page 20

page 21

research
10/25/2020

A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks

When equipped with efficient optimization algorithms, the over-parameter...
research
09/26/2019

GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks

Current training methods for deep neural networks boil down to very high...
research
12/19/2014

Qualitatively characterizing neural network optimization problems

Training neural networks involves solving large-scale non-convex optimiz...
research
12/10/2018

Why Does Stagewise Training Accelerate Convergence of Testing Error Over SGD?

Stagewise training strategy is commonly used for learning neural network...
research
11/13/2015

On the Quality of the Initial Basin in Overspecified Neural Networks

Deep learning, in the form of artificial neural networks, has achieved r...
research
11/01/2021

Investigating the locality of neural network training dynamics

A fundamental quest in the theory of deep-learning is to understand the ...

Please sign up or login with your details

Forgot password? Click here to reset