On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

04/13/2020
by   Wei Huang, et al.
14

In recent years, a critical initialization scheme with orthogonal initialization deep nonlinear networks has been proposed. The orthogonal weights are crucial to achieve dynamical isometry for random networks, where entire spectrum of singular values of an input-output Jacobian are around one. The strong empirical evidence that orthogonal initialization in linear networks and the linear regime of non-linear networks can speed up training than Gaussian networks raise great interests. One recent work has proven the benefit of orthogonal initialization in linear networks. However, the dynamics behind it have not been revealed on non-linear networks. In this work, we study the Neural Tangent Kernel (NTK), which describes the gradient descent training of wide networks, on orthogonal, wide, fully-connect, and nonlinear networks. We prove that NTK of Gaussian and Orthogonal weights are equal when the network width is infinite, resulting in a conclusion that orthogonal initialization can speed up training is a finite-width effect in the small learning rate region. Then we find that during training, the NTK of infinite-width networks with orthogonal initialization stay constant theoretically and vary at a rate of the same order empirically as Gaussian ones, as the width tends to infinity. Finally, we conduct a thorough empirical investigation of training speed on CIFAR10 datasets and show the benefit of orthogonal initialization lies in the large learning rate and depth region in a linear regime of nonlinear networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

The selection of initial parameter values for gradient-based optimizatio...
research
10/14/2019

Effects of Depth, Width, and Initialization: A Convergence Analysis of Layer-wise Training for Deep Linear Neural Networks

Deep neural networks have been used in various machine learning applicat...
research
09/24/2018

Dense neural networks as sparse graphs and the lightning initialization

Even though dense networks have lost importance today, they are still us...
research
12/20/2013

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

Despite the widespread practical success of deep learning methods, our t...
research
02/27/2018

The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of...
research
09/19/2022

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Among attempts at giving a theoretical account of the success of deep ne...
research
09/08/2023

Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime

Artificial neural networks have revolutionized machine learning in recen...

Please sign up or login with your details

Forgot password? Click here to reset