Recurrent Highway Networks

07/12/2016
by   Julian Georg Zilly, et al.
0

Many sequential processing tasks require complex nonlinear transition functions from one step to the next. However, recurrent neural networks with 'deep' transition functions remain difficult to train, even when using Long Short-Term Memory (LSTM) networks. We introduce a novel theoretical analysis of recurrent networks based on Gersgorin's circle theorem that illuminates several modeling and optimization issues and improves our understanding of the LSTM cell. Based on this analysis we propose Recurrent Highway Networks, which extend the LSTM architecture to allow step-to-step transition depths larger than one. Several language modeling experiments demonstrate that the proposed architecture results in powerful and efficient models. On the Penn Treebank corpus, solely increasing the transition depth from 1 to 10 improves word-level perplexity from 90.6 to 65.4 using the same number of parameters. On the larger Wikipedia datasets for character prediction (text8 and enwik8), RHNs outperform all previous results and achieve an entropy of 1.27 bits per character.

READ FULL TEXT
research
09/26/2016

Multiplicative LSTM for sequence modelling

We introduce multiplicative LSTM (mLSTM), a recurrent neural network arc...
research
07/14/2017

Simplified Long Short-term Memory Recurrent Neural Networks: part I

We present five variants of the standard Long Short-term Memory (LSTM) r...
research
01/07/2020

State Transition Modeling of the Smoking Behavior using LSTM Recurrent Neural Networks

The use of sensors has pervaded everyday life in several applications in...
research
10/04/2018

Recurrent Transition Networks for Character Locomotion

Manually authoring transition animations for a complete locomotion syste...
research
09/04/2019

Mogrifier LSTM

Many advances in Natural Language Processing have been based upon more e...
research
06/28/2019

ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network

In recent years, memory-augmented neural networks(MANNs) have shown prom...
research
05/22/2020

A Tree Architecture of LSTM Networks for Sequential Regression with Missing Data

We investigate regression for variable length sequential data containing...

Please sign up or login with your details

Forgot password? Click here to reset