Transformers Generalize Linearly

09/24/2021
by   Jackson Petty, et al.
0

Natural language exhibits patterns of hierarchically governed dependencies, in which relations between words are sensitive to syntactic structure rather than linear ordering. While re-current network models often fail to generalize in a hierarchically sensitive way (McCoy et al.,2020) when trained on ambiguous data, the improvement in performance of newer Trans-former language models (Vaswani et al., 2017)on a range of syntactic benchmarks trained on large data sets (Goldberg, 2019; Warstadtet al., 2019) opens the question of whether these models might exhibit hierarchical generalization in the face of impoverished data.In this paper we examine patterns of structural generalization for Transformer sequence-to-sequence models and find that not only do Transformers fail to generalize hierarchically across a wide variety of grammatical mapping tasks, but they exhibit an even stronger preference for linear generalization than comparable recurrent networks

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2022

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

Relations between words are governed by hierarchical structure rather th...
research
03/08/2019

Neural Language Models as Psycholinguistic Subjects: Representations of Syntactic State

We deploy the methods of controlled psycholinguistic experimentation to ...
research
08/26/2021

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Recently, many datasets have been proposed to test the systematic genera...
research
09/24/2019

Do Massively Pretrained Language Models Make Better Storytellers?

Large neural language models trained on massive amounts of text have eme...
research
05/03/2018

The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models

Seq2Seq based neural architectures have become the go-to architecture to...
research
10/31/2018

A task in a suit and a tie: paraphrase generation with semantic augmentation

Paraphrasing is rooted in semantics. We show the effectiveness of transf...

Please sign up or login with your details

Forgot password? Click here to reset