Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

01/16/2020
by   Wei Hu, et al.
0

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance. Yet despite significant empirical and theoretical analysis, relatively little has been proved about the concrete effects of different initialization schemes. In this work, we analyze the effect of initialization in deep linear networks, and provide for the first time a rigorous proof that drawing the initial weights from the orthogonal group speeds up convergence relative to the standard Gaussian initialization with iid weights. We show that for deep networks, the width needed for efficient convergence to a global minimum with orthogonal initializations is independent of the depth, whereas the width needed for efficient convergence with Gaussian initializations scales linearly in the depth. Our results demonstrate how the benefits of a good initialization can persist throughout learning, suggesting an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.

READ FULL TEXT
research
04/13/2020

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

In recent years, a critical initialization scheme with orthogonal initia...
research
10/14/2019

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

Deep neural networks have been used in various machine learning applicat...
research
09/15/2022

Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)

We study the average robustness notion in deep neural networks in (selec...
research
12/20/2013

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our t...
research
10/29/2021

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

To theoretically understand the behavior of trained deep neural networks...
research
02/19/2018

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

Conventional wisdom in deep learning states that increasing depth improv...
research
07/19/2022

Deep equilibrium networks are sensitive to initialization statistics

Deep equilibrium networks (DEQs) are a promising way to construct models...

Please sign up or login with your details

Forgot password? Click here to reset