Rethinking Normalization and Elimination Singularity in Neural Networks

11/21/2019
by   Siyuan Qiao, et al.
0

In this paper, we study normalization methods for neural networks from the perspective of elimination singularity. Elimination singularities correspond to the points on the training trajectory where neurons become consistently deactivated. They cause degenerate manifolds in the loss landscape which will slow down training and harm model performances. We show that channel-based normalizations (e.g. Layer Normalization and Group Normalization) are unable to guarantee a far distance from elimination singularities, in contrast with Batch Normalization which by design avoids models from getting too close to them. To address this issue, we propose BatchChannel Normalization (BCN), which uses batch knowledge to avoid the elimination singularities in the training of channel-normalized models. Unlike Batch Normalization, BCN is able to run in both large-batch and micro-batch training settings. The effectiveness of BCN is verified on many tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is here: https://github.com/joe-siyuan-qiao/Batch-Channel-Normalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2019

Weight Standardization

In this paper, we propose Weight Standardization (WS) to accelerate deep...
research
08/31/2019

Towards Improving Generalization of Deep Networks via Consistent Normalization

Batch Normalization (BN) was shown to accelerate training and improve ge...
research
12/12/2019

Local Context Normalization: Revisiting Local Normalization

Normalization layers have been shown to improve convergence in deep neur...
research
08/12/2019

Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise

Batch Normalization (BN) (Ioffe and Szegedy 2015) normalizes the feature...
research
07/06/2022

Difference in Euclidean Norm Can Cause Semantic Divergence in Batch Normalization

In this paper, we show that the difference in Euclidean norm of samples ...
research
06/07/2019

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

Normalization methods play an important role in enhancing the performanc...
research
02/24/2023

On the Training Instability of Shuffling SGD with Batch Normalization

We uncover how SGD interacts with batch normalization and can exhibit un...

Please sign up or login with your details

Forgot password? Click here to reset