Breaking Time Invariance: Assorted-Time Normalization for RNNs

09/28/2022
by   Cole Pospisil, et al.
0

Methods such as Layer Normalization (LN) and Batch Normalization (BN) have proven to be effective in improving the training of Recurrent Neural Networks (RNNs). However, existing methods normalize using only the instantaneous information at one particular time step, and the result of the normalization is a preactivation state with a time-independent distribution. This implementation fails to account for certain temporal differences inherent in the inputs and the architecture of RNNs. Since these networks share weights across time steps, it may also be desirable to account for the connections between time steps in the normalization scheme. In this paper, we propose a normalization method called Assorted-Time Normalization (ATN), which preserves information from multiple consecutive time steps and normalizes using them. This setup allows us to introduce longer time dependencies into the traditional normalization methods without introducing any new trainable parameters. We present theoretical derivations for the gradient propagation and prove the weight scaling invariance property. Our experiments applying ATN to LN demonstrate consistent improvement on various tasks, such as Adding, Copying, and Denoise Problems and Language Modeling Problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2015

Batch Normalized Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are powerful models for sequential data...
research
07/21/2016

Layer Normalization

Training state-of-the-art, deep neural networks is computationally expen...
research
05/15/2019

Online Normalization for Training Neural Networks

Online Normalization is a new technique for normalizing the hidden activ...
research
06/16/2020

New Interpretations of Normalization Methods in Deep Learning

In recent years, a variety of normalization methods have been proposed t...
research
02/25/2016

Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

We present weight normalization: a reparameterization of the weight vect...
research
10/16/2019

Root Mean Square Layer Normalization

Layer normalization (LayerNorm) has been successfully applied to various...
research
02/15/2017

Training Language Models Using Target-Propagation

While Truncated Back-Propagation through Time (BPTT) is the most popular...

Please sign up or login with your details

Forgot password? Click here to reset