Exploring the Long-Term Generalization of Counting Behavior in RNNs

11/29/2022
by   Nadine El-Naggar, et al.
0

In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2018

On the Practical Computational Power of Finite Precision RNNs for Language Recognition

While Recurrent Neural Networks (RNNs) are famously known to be Turing c...
research
04/07/2023

Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Previous work has established that RNNs with an unbounded activation fun...
research
05/16/2017

Subregular Complexity and Deep Learning

This paper argues that the judicial use of formal language theory and gr...
research
05/16/2020

Achieving Online Regression Performance of LSTMs with Simple RNNs

Recurrent Neural Networks (RNNs) are widely used for online regression d...
research
10/29/2018

Counting in Language with RNNs

In this paper we examine a possible reason for the LSTM outperforming th...
research
03/01/2018

Learning Longer-term Dependencies in RNNs with Auxiliary Losses

Despite recent advances in training recurrent neural networks (RNNs), ca...
research
04/14/2020

Exploring Cell counting with Neural Arithmetic Logic Units

The big problem for neural network models which are trained to count ins...

Please sign up or login with your details

Forgot password? Click here to reset