Bivariate Beta LSTM

05/25/2019
by   Kyungwoo Song, et al.
0

Long Short-Term Memory (LSTM) infers the long term dependency through a cell state maintained by the input and the forget gate structures, which models a gate output as a value in [0,1] through a sigmoid function. However, due to the graduality of the sigmoid function, the sigmoid gate is not flexible in representing multi-modality or skewness. Besides, the previous models lack correlation modeling between the gates, which would be a new method to adopt domain knowledge. This paper proposes a new gate structure with the bivariate Beta distribution. The proposed gate structure enables hierarchical probabilistic modeling on the gates within the LSTM cell, so the modelers can customize the cell state flow. Also, we observed that our structured flexible gate modeling is enabled by the probability density estimation. Moreover, we theoretically show and empirically experiment that the bivariate Beta distribution gate structure alleviates the gradient vanishing problem. We demonstrate the effectiveness of bivariate Beta gate structure on the sentence classification, image classification, polyphonic music modeling, and image caption generation.

READ FULL TEXT

page 6

page 7

research
04/13/2018

The unreasonable effectiveness of the forget gate

Given the success of the gated recurrent unit, a natural question is whe...
research
05/17/2016

Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM

Modeling human conversations is the essence for building satisfying chat...
research
10/04/2022

Fast Saturating Gate for Learning Long Time Scales with Recurrent Neural Networks

Gate functions in recurrent models, such as an LSTM and GRU, play a cent...
research
03/08/2023

Flexible and slim device switching air blowing and suction by a single airflow control

This study proposes a soft robotic device with a slim and flexible body ...
research
05/30/2019

Quantization Loss Re-Learning Method

In order to quantize the gate parameters of the LSTM (Long Short-Term Me...
research
08/31/2021

Working Memory Connections for LSTM

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of...
research
06/01/2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks fast...

Please sign up or login with your details

Forgot password? Click here to reset