Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

12/08/2020
by   Daniel Kunin, et al.
0

Predicting the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network's architecture that are present for any dataset. We show that any such symmetry imposes stringent geometric constraints on gradients and Hessians, leading to an associated conservation law in the continuous-time limit of stochastic gradient descent (SGD), akin to Noether's theorem in physics. We further show that finite learning rates used in practice can actually break these symmetry induced conservation laws. We apply tools from finite difference methods to derive modified gradient flow, a differential equation that better approximates the numerical trajectory taken by SGD at finite learning rates. We combine modified gradient flow with our framework of symmetries to derive exact integral expressions for the dynamics of certain parameter combinations. We empirically validate our analytic predictions for learning dynamics on VGG-16 trained on Tiny ImageNet. Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2021

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

In this work we explore the limiting dynamics of deep neural networks tr...
research
09/29/2018

Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning

Although stochastic gradient descent (SGD) is a driving force behind the...
research
01/27/2017

Reinforced stochastic gradient descent for deep neural network learning

Stochastic gradient descent (SGD) is a standard optimization method to m...
research
06/30/2023

Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows

Understanding the geometric properties of gradient descent dynamics is a...
research
06/04/2021

Fluctuation-dissipation Type Theorem in Stochastic Linear Learning

The fluctuation-dissipation theorem (FDT) is a simple yet powerful conse...
research
07/21/2022

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

Our theoretical understanding of deep learning has not kept pace with it...
research
04/12/2021

Noether: The More Things Change, the More Stay the Same

Symmetries have proven to be important ingredients in the analysis of ne...

Please sign up or login with your details

Forgot password? Click here to reset