Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

11/22/2022
by   Satwik Bhattamishra, et al.
0

Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer's effective generalization performance despite relatively limited expressiveness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

On the Ability of Self-Attention Networks to Recognize Counter Languages

Transformers have supplanted recurrent models in a large number of NLP t...
research
10/11/2021

Leveraging Transformers for StarCraft Macromanagement Prediction

Inspired by the recent success of transformers in natural language proce...
research
04/21/2021

Sensitivity as a Complexity Measure for Sequence Classification Tasks

We introduce a theoretical framework for understanding and predicting th...
research
07/05/2022

Neural Networks and the Chomsky Hierarchy

Reliable generalization lies at the heart of safe ML and AI. However, un...
research
11/06/2020

Extending Equational Monadic Reasoning with Monad Transformers

There is a recent interest for the verification of monadic programs usin...
research
11/08/2020

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

While recurrent models have been effective in NLP tasks, their performan...
research
10/14/2022

Pretrained Transformers Do not Always Improve Robustness

Pretrained Transformers (PT) have been shown to improve Out of Distribut...

Please sign up or login with your details

Forgot password? Click here to reset