Understanding and Improving Layer Normalization

11/16/2019
by   Jingjing Xu, et al.
0

Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2016

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expen...
research
12/29/2020

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

Encoder layer fusion (EncoderFusion) is a technique to fuse all the enco...
research
08/07/2023

AFN: Adaptive Fusion Normalization via Encoder-Decoder Framework

The success of deep learning is inseparable from normalization layers. R...
research
10/17/2020

A Convenient Generalization of Schlick's Bias and Gain Functions

We present a generalization of Schlick's bias and gain functions – simpl...
research
12/11/2018

Controlling Covariate Shift using Equilibrium Normalization of Weights

We introduce a new normalization technique that exhibits the fast conver...
research
10/12/2018

Mode Normalization

Normalization methods are a central building block in the deep learning ...
research
08/05/2023

On problematic practice of using normalization in Self-modeling/Multivariate Curve Resolution (S/MCR)

The paper is briefly dealing with greater or lesser misused normalizatio...

Please sign up or login with your details

Forgot password? Click here to reset