The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

06/07/2019
by   Ryo Karakida, et al.
5

Normalization methods play an important role in enhancing the performance of deep learning while their theoretical understandings have been limited. To theoretically elucidate the effectiveness of normalization, we quantify the geometry of the parameter space determined by the Fisher information matrix (FIM), which also corresponds to the local shape of the loss landscape under certain conditions. We analyze deep neural networks with random initialization, which is known to suffer from a pathologically sharp shape of the landscape when the network becomes sufficiently wide. We reveal that batch normalization in the last layer contributes to drastically decreasing such pathological sharpness if the width and sample number satisfy a specific condition. In contrast, it is hard for batch normalization in the middle hidden layers to alleviate pathological sharpness in many settings. We also found that layer normalization cannot alleviate pathological sharpness either. Thus, we can conclude that batch normalization in the last layer significantly contributes to decreasing the sharpness induced by the FIM.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 9

page 10

page 12

research
06/11/2020

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normali...
research
05/26/2023

Ghost Noise for Regularizing Deep Neural Networks

Batch Normalization (BN) is widely used to stabilize the optimization pr...
research
03/06/2019

Mean-field Analysis of Batch Normalization

Batch Normalization (BatchNorm) is an extremely useful component of mode...
research
05/28/2023

On the impact of activation and normalization in obtaining isometric embeddings at initialization

In this paper, we explore the structure of the penultimate Gram matrix i...
research
11/21/2019

Rethinking Normalization and Elimination Singularity in Neural Networks

In this paper, we study normalization methods for neural networks from t...
research
10/14/2019

Pathological spectra of the Fisher information metric and its variants in deep neural networks

The Fisher information matrix (FIM) plays an essential role in statistic...
research
03/28/2022

To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding

Batch-Normalization (BN) layers have become fundamental components in th...

Please sign up or login with your details

Forgot password? Click here to reset