Mogrifier LSTM

09/04/2019
by   Gábor Melis, et al.
0

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

READ FULL TEXT
research
09/26/2016

Multiplicative LSTM for sequence modelling

We introduce multiplicative LSTM (mLSTM), a recurrent neural network arc...
research
11/03/2022

Circling Back to Recurrent Models of Language

Just because some purely recurrent models suffer from being hard to opti...
research
07/12/2016

Recurrent Highway Networks

Many sequential processing tasks require complex nonlinear transition fu...
research
11/11/2015

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level disc...
research
12/16/2021

Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations

Long Short-Term Memory (LSTM) and Transformers are two popular neural ar...
research
10/10/2018

Persistence pays off: Paying Attention to What the LSTM Gating Mechanism Persists

Language Models (LMs) are important components in several Natural Langua...
research
06/21/2023

Probing the limit of hydrologic predictability with the Transformer network

For a number of years since its introduction to hydrology, recurrent neu...

Please sign up or login with your details

Forgot password? Click here to reset