A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

01/01/2020
by   Zhaodong Chen, et al.
28

In recent years, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual block. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate the challenging derivation, we propose a highly modularized statistical framework based on free probability. Our framework includes several key theorems to handle complex serial-parallel hybrid connections and a library to cover the diversity of network components. Besides, several sufficient prerequisites are provided. Powered by our metric and framework, we analyze extensive initialization, normalization, and network structures. We find that Gradient Norm Equality is a universal philosophy behind them. Then, we improve some existing methods based on our analysis, including an activation function selection strategy for initialization techniques, a new configuration for weight normalization, and a depth-aware way to derive coefficients in SeLU. Moreover, we propose a novel normalization technique named second moment normalization, which is theoretically 30 than batch normalization without accuracy loss. Last but not least, our conclusions and methods are evidenced by extensive experiments on multiple models over CIFAR10 and ImageNet.

READ FULL TEXT

page 14

page 15

page 18

research
10/05/2022

Dynamical Isometry for Residual Networks

The training success, training speed and generalization ability of neura...
research
06/16/2020

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed t...
research
04/16/2021

"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization

Batch normalization (BN) is a key facilitator and considered essential f...
research
03/12/2019

Theory III: Dynamics and Generalization in Deep Networks

We review recent observations on the dynamical systems induced by gradie...
research
04/06/2019

Instance-Level Meta Normalization

This paper presents a normalization mechanism called Instance-Level Meta...
research
05/27/2021

Towards Efficient Full 8-bit Integer DNN Online Training on Resource-limited Devices without Batch Normalization

Huge computational costs brought by convolution and batch normalization ...
research
10/21/2020

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence

We perform an extensive empirical study of the statistical properties of...

Please sign up or login with your details

Forgot password? Click here to reset