Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning

06/10/2021
by   Ekdeep Singh Lubana, et al.
0

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization techniques, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to nine recently proposed normalization layers. Our primary findings follow: (i) Similar to BatchNorm, activations-based normalization layers can avoid exploding activations in ResNets; (ii) Use of GroupNorm ensures rank of activations is at least Ω(√(width/Group Size)), thus explaining why LayerNorm witnesses slow optimization speed; (iii) Small group sizes result in large gradient norm in earlier layers, hence justifying training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals several general mechanisms that explain the success of normalization techniques in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normali...
research
10/16/2020

Filtered Batch Normalization

It is a common assumption that the activation of different layers in neu...
research
02/01/2023

A Survey of Deep Learning: From Activations to Transformers

Deep learning has made tremendous progress in the last decade. A key suc...
research
11/14/2016

Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

Normalization techniques have only recently begun to be exploited in sup...
research
06/01/2018

Understanding Batch Normalization

Batch normalization is a ubiquitous deep learning technique that normali...
research
09/27/2020

Normalization Techniques in Training DNNs: Methodology, Analysis and Application

Normalization techniques are essential for accelerating the training and...
research
07/09/2019

Positional Normalization

A widely deployed method for reducing the training time of deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset