Regularization and nonlinearities for neural language models: when are they needed?

01/23/2013
by   Marius Pachitariu, et al.
0

Neural language models (LMs) based on recurrent neural networks (RNN) are some of the most successful word and character-level LMs. Why do they work so well, in particular better than linear neural LMs? Possible explanations are that RNNs have an implicitly better regularization or that RNNs have a higher capacity for storing patterns due to their nonlinearities or both. Here we argue for the first explanation in the limit of little training data and the second explanation for large amounts of text data. We show state-of-the-art performance on the popular and small Penn dataset when RNN LMs are regularized with random dropout. Nonetheless, we show even better performance from a simplified, much less expressive linear RNN model without off-diagonal entries in the recurrent matrix. We call this model an impulse-response LM (IRLM). Using random dropout, column normalization and annealed learning rates, IRLMs develop neurons that keep a memory of up to 50 words in the past and achieve a perplexity of 102.5 on the Penn dataset. On two large datasets however, the same regularization methods are unsuccessful for both models and the RNN's expressivity allows it to overtake the IRLM by 10 and 20 percent perplexity, respectively. Despite the perplexity gap, IRLMs still outperform RNNs on the Microsoft Research Sentence Completion (MRSC) task. We develop a slightly modified IRLM that separates long-context units (LCUs) from short-context units and show that the LCUs alone achieve a state-of-the-art performance on the MRSC task of 60.8 neural LMs lies in developing more accessible internal representations, and suggests an optimization regime of very high momentum terms for effectively training such models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2018

Noisin: Unbiased Regularization for Recurrent Neural Networks

Recurrent neural networks (RNNs) are powerful models of sequential data....
research
07/05/2015

Dependency Recurrent Neural Language Models for Sentence Completion

Recent work on language modelling has shifted focus from count-based mod...
research
10/22/2018

An Exploration of Dropout with RNNs for Natural Language Inference

Dropout is a crucial regularization technique for the Recurrent Neural N...
research
02/24/2016

Toward Mention Detection Robustness with Recurrent Neural Networks

One of the key challenges in natural language processing (NLP) is to yie...
research
10/31/2017

Fraternal Dropout

Recurrent neural networks (RNNs) are important class of architectures am...
research
06/03/2016

Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations

We propose zoneout, a novel method for regularizing RNNs. At each timest...
research
05/16/2018

Learning to Write with Cooperative Discriminators

Recurrent Neural Networks (RNNs) are powerful autoregressive sequence mo...

Please sign up or login with your details

Forgot password? Click here to reset