Information Geometry of Orthogonal Initializations and Training

10/09/2018
by   Piotr A. Sokol, et al.
0

Recently mean field theory has been successfully used to analyze properties of wide, random neural networks. It has given rise to a prescriptive theory for initializing neural networks, which ensures that the ℓ_2 norm of the backpropagated gradients is bounded, and training is orders of magnitude faster. Despite the strong empirical performance of this class of initializations, the mechanisms by which they confer an advantage in the optimization of deep neural networks are poorly understood. Here we show a novel connection between the maximum curvature of the optimization landscape (gradient smoothness) as measured by the Fisher information matrix and the maximum singular value of the input-output Jacobian. Our theory partially explains why neural networks that are more isometric can train much faster. Furthermore, we experimentally investigate the benefits of maintaining orthogonality throughout training, from which we conclude that manifold constrained optimization of weights performs better regardless of the smoothness of the gradients. Finally we show that critical orthogonal initializations do not trivially give rise to a mean field limit of pre-activations for each layer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach

This study analyzes the Fisher information matrix (FIM) by applying mean...
research
06/01/2019

A mean-field limit for certain deep neural networks

Understanding deep neural networks (DNNs) is a key challenge in the theo...
research
11/13/2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

It is well known that the initialization of weights in deep neural netwo...
research
06/14/2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

In recent years, state-of-the-art methods in computer vision have utiliz...
research
10/17/2021

A Riemannian Mean Field Formulation for Two-layer Neural Networks with Batch Normalization

The training dynamics of two-layer neural networks with batch normalizat...
research
12/19/2019

Mean field theory for deep dropout networks: digging up gradient backpropagation deeply

In recent years, the mean field theory has been applied to the study of ...
research
04/27/2023

Convergence of Adam Under Relaxed Assumptions

In this paper, we provide a rigorous proof of convergence of the Adaptiv...

Please sign up or login with your details

Forgot password? Click here to reset