On the Ability of Self-Attention Networks to Recognize Counter Languages

09/23/2020
by   Satwik Bhattamishra, et al.
0

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on regular languages and have close connections with counter languages. In this work, we systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so. We first provide a construction of Transformers for a subclass of counter languages, including well-studied languages such as n-ary Boolean Expressions, Dyck-1, and its generalizations. In experiments, we find that Transformers do well on this subclass, and their learned mechanism strongly correlates with our construction. Perhaps surprisingly, in contrast to LSTMs, Transformers do well only on a subset of regular languages with degrading performance as we make languages more complex according to a well-known measure of complexity. Our analysis also provides insights on the role of self-attention mechanism in modeling certain behavior and the influence of positional encoding schemes on the learning and generalization ability of the model.

READ FULL TEXT
research
09/02/2023

Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

Despite that Transformers perform well in NLP tasks, recent studies sugg...
research
11/22/2022

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

Despite the widespread success of Transformers on NLP tasks, recent work...
research
06/16/2019

Theoretical Limitations of Self-Attention in Neural Sequence Models

Transformers are emerging as the new workhorse of NLP, showing great suc...
research
10/09/2020

How Can Self-Attention Networks Recognize Dyck-n Languages?

We focus on the recognition of Dyck-n (𝒟_n) languages with self-attentio...
research
04/13/2022

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity

This paper analyzes three formal models of Transformer encoders that dif...
research
04/15/2020

On the Linguistic Capacity of Real-Time Counter Automata

Counter machines have achieved a newfound relevance to the field of natu...
research
02/24/2022

Overcoming a Theoretical Limitation of Self-Attention

Although transformers are remarkably effective for many tasks, there are...

Please sign up or login with your details

Forgot password? Click here to reset