LSTM: A Search Space Odyssey

03/13/2015
by   Klaus Greff, et al.
0

Several variants of the Long Short-Term Memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful fANOVA framework. In total, we summarize the results of 5400 experimental runs (≈ 15 years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

READ FULL TEXT

page 8

page 9

page 12

research
01/02/2019

Performance of Three Slim Variants of The Long Short-Term Memory (LSTM) Layer

The Long Short-Term Memory (LSTM) layer is an important advancement in t...
research
01/12/2017

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

The standard LSTM recurrent neural networks while very powerful in long-...
research
08/17/2017

An Improved Residual LSTM Architecture for Acoustic Modeling

Long Short-Term Memory (LSTM) is the primary recurrent neural networks a...
research
12/12/2016

Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)

The standard LSTM, although it succeeds in the modeling long-range depen...
research
03/12/2018

From Nodes to Networks: Evolving Recurrent Neural Networks

Gated recurrent networks such as those composed of Long Short-Term Memor...
research
06/22/2018

Persistent Hidden States and Nonlinear Transformation for Long Short-Term Memory

Recurrent neural networks (RNNs) have been drawing much attention with g...
research
10/16/2018

Reduced-Gate Convolutional LSTM Using Predictive Coding for Spatiotemporal Prediction

Spatiotemporal sequence prediction is an important problem in deep learn...

Please sign up or login with your details

Forgot password? Click here to reset