Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations

12/16/2021
by   Hui Shi, et al.
15

Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. Theoretical results show that both are Turing-complete and can represent any context-free language (CFL).In practice, it is often observed that Transformer models have better representation power than LSTM. But the reason is barely understood. We study such practical differences between LSTM and Transformer and propose an explanation based on their latent space decomposition patterns. To achieve this goal, we introduce an oracle training paradigm, which forces the decomposition of the latent representation of LSTM and the Transformer and supervises with the transitions of the Pushdown Automaton (PDA) of the corresponding CFL. With the forced decomposition, we show that the performance upper bounds of LSTM and Transformer in learning CFL are close: both of them can simulate a stack and perform stack operation along with state transitions. However, the absence of forced decomposition leads to the failure of LSTM models to capture the stack and stack operations, while having a marginal impact on the Transformer model. Lastly, we connect the experiment on the prototypical PDA to a real-world parsing task to re-verify the conclusions

READ FULL TEXT
research
05/01/2018

A Taxonomy for Neural Memory Networks

In this paper, a taxonomy for memory networks is proposed based on their...
research
09/20/2023

Transformers versus LSTMs for electronic trading

With the rapid development of artificial intelligence, long short term m...
research
09/16/2022

Transformer-based Detection of Multiword Expressions in Flower and Plant Names

Multiword expression (MWE) is a sequence of words which collectively pre...
research
09/04/2019

Mogrifier LSTM

Many advances in Natural Language Processing have been based upon more e...
research
12/13/2020

Radial Deformation Emplacement in Power Transformers Using Long Short-Term Memory Networks

A power transformer winding is usually subject to mechanical stress and ...
research
06/21/2023

Probing the limit of hydrologic predictability with the Transformer network

For a number of years since its introduction to hydrology, recurrent neu...
research
05/16/2020

Automatic Dialogic Instruction Detection for K-12 Online One-on-one Classes

Online one-on-one class is created for highly interactive and immersive ...

Please sign up or login with your details

Forgot password? Click here to reset