Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

03/22/2021
by   Yuxiang Liu, et al.
0

Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift of the average gradient and explains why BN works from the perspective of variance transmission. The code and appendix will be made available on https://github.com/lyxzzz/PWSConv.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

Batch Layer Normalization, A new normalization layer for CNNs and RNN

This study introduces a new normalization layer termed Batch Layer Norma...
research
12/11/2018

Controlling Covariate Shift using Equilibrium Normalization of Weights

We introduce a new normalization technique that exhibits the fast conver...
research
01/15/2021

Dynamic Normalization

Batch Normalization has become one of the essential components in CNN. I...
research
01/16/2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

This paper first answers the question "why do the two most powerful tech...
research
11/28/2020

Batch Normalization with Enhanced Linear Transformation

Batch normalization (BN) is a fundamental unit in modern deep networks, ...
research
02/18/2016

RandomOut: Using a convolutional gradient norm to rescue convolutional filters

Filters in convolutional neural networks are sensitive to their initiali...
research
12/11/2022

Orthogonal SVD Covariance Conditioning and Latent Disentanglement

Inserting an SVD meta-layer into neural networks is prone to make the co...

Please sign up or login with your details

Forgot password? Click here to reset