Regularizing RNNs by Stabilizing Activations

by   David Krueger, et al.
Université de Montréal

We stabilize the activations of Recurrent Neural Networks (RNNs) by penalizing the squared distance between successive hidden states' norms. This penalty term is an effective regularizer for RNNs including LSTMs and IRNNs, improving performance on character-level language modeling and phoneme recognition, and outperforming weight noise and dropout. We achieve competitive performance (18.6% PER) on the TIMIT phoneme recognition task for RNNs evaluated without beam search or an RNN transducer. With this penalty term, IRNN can achieve similar performance to LSTM on language modeling, although adding the penalty term to the LSTM results in superior performance. Our penalty term also prevents the exponential growth of IRNN's activations outside of their training horizon, allowing them to generalize to much longer sequences.


Revisiting Activation Regularization for Language RNNs

Recurrent neural networks (RNNs) serve as a fundamental building block f...

Counting in Language with RNNs

In this paper we examine a possible reason for the LSTM outperforming th...

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

We propose zoneout, a novel method for regularizing RNNs. At each timest...

Regularizing and Optimizing LSTM Language Models

Recurrent neural networks (RNNs), such as long short-term memory network...

Fast-Slow Recurrent Neural Networks

Processing sequential data of variable length is a major challenge in a ...

Alternating Synthetic and Real Gradients for Neural Language Modeling

Training recurrent neural networks (RNNs) with backpropagation through t...

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Recurrent neural networks (RNNs) have shown promising performance for la...

Code Repositories

Please sign up or login with your details

Forgot password? Click here to reset