Mogrifier LSTM

by   Gábor Melis, et al.

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.



There are no comments yet.


page 13


Multiplicative LSTM for sequence modelling

We introduce multiplicative LSTM (mLSTM), a recurrent neural network arc...

Recurrent Highway Networks

Many sequential processing tasks require complex nonlinear transition fu...

Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations

Long Short-Term Memory (LSTM) and Transformers are two popular neural ar...

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level disc...

Automatic Rule Extraction from Long Short Term Memory Networks

Although deep learning models have proven effective at solving problems ...

Variable-sized input, character-level recurrent neural networks in lead generation: predicting close rates from raw user inputs

Predicting lead close rates is one of the most problematic tasks in the ...

Recurrent Transition Networks for Character Locomotion

Manually authoring transition animations for a complete locomotion syste...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.