Layer Normalization

07/21/2016
by   Jimmy Lei Ba, et al.
0

Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2015

Batch Normalized Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are powerful models for sequential data...
research
09/19/2022

Batch Layer Normalization, A new normalization layer for CNNs and RNN

This study introduces a new normalization layer termed Batch Layer Norma...
research
11/14/2016

Normalizing the Normalizers: Comparing and Extending Network Normalization Schemes

Normalization techniques have only recently begun to be exploited in sup...
research
02/19/2021

Convolutional Normalization

As the deep neural networks are being applied to complex tasks, the size...
research
11/16/2019

Understanding and Improving Layer Normalization

Layer normalization (LayerNorm) is a technique to normalize the distribu...
research
09/28/2022

Breaking Time Invariance: Assorted-Time Normalization for RNNs

Methods such as Layer Normalization (LN) and Batch Normalization (BN) ha...
research
02/20/2017

Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks

Traditionally, multi-layer neural networks use dot product between the o...

Please sign up or login with your details

Forgot password? Click here to reset