Towards Binary-Valued Gates for Robust LSTM Training

06/08/2018
by   Zhuohan Li, et al.
0

Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.

READ FULL TEXT
research
04/13/2018

The unreasonable effectiveness of the forget gate

Given the success of the gated recurrent unit, a natural question is whe...
research
12/12/2016

Empirical Evaluation of A New Approach to Simplifying Long Short-term Memory (LSTM)

The standard LSTM, although it succeeds in the modeling long-range depen...
research
10/25/2019

A memory enhanced LSTM for modeling complex temporal dependencies

In this paper, we present Gamma-LSTM, an enhanced long short term memory...
research
11/21/2017

Cross Temporal Recurrent Networks for Ranking Question Answer Pairs

Temporal gates play a significant role in modern recurrent-based neural ...
research
05/09/2018

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

LSTMs were introduced to combat vanishing gradients in simple RNNs by au...
research
01/08/2018

Spiking memristor logic gates are a type of time-variant perceptron

Memristors are low-power memory-holding resistors thought to be useful f...
research
05/30/2018

Grow and Prune Compact, Fast, and Accurate LSTMs

Long short-term memory (LSTM) has been widely used for sequential data m...

Please sign up or login with your details

Forgot password? Click here to reset