Mogrifier LSTM

09/04/2019
by   Gábor Melis, et al.
0

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

09/26/2016

Multiplicative LSTM for sequence modelling

We introduce multiplicative LSTM (mLSTM), a recurrent neural network arc...
07/12/2016

Recurrent Highway Networks

Many sequential processing tasks require complex nonlinear transition fu...
12/16/2021

Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations

Long Short-Term Memory (LSTM) and Transformers are two popular neural ar...
11/11/2015

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level disc...
02/08/2017

Automatic Rule Extraction from Long Short Term Memory Networks

Although deep learning models have proven effective at solving problems ...
01/16/2019

Variable-sized input, character-level recurrent neural networks in lead generation: predicting close rates from raw user inputs

Predicting lead close rates is one of the most problematic tasks in the ...
10/04/2018

Recurrent Transition Networks for Character Locomotion

Manually authoring transition animations for a complete locomotion syste...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.