The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

10/14/2021
by   Róbert Csordás, et al.
16

Despite successes across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100 compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depth. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.

READ FULL TEXT

page 7

page 18

page 19

page 20

page 21

page 22

page 23

page 24

research
12/01/2021

Systematic Generalization with Edge Transformers

Recent research suggests that systematic generalization in natural langu...
research
10/23/2022

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

Humans can reason compositionally whilst grounding language utterances t...
research
10/02/2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...
research
08/26/2021

The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers

Recently, many datasets have been proposed to test the systematic genera...
research
09/30/2021

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

Systematic compositionality is an essential mechanism in human language,...
research
11/02/2022

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

When trained on language data, do transformers learn some arbitrary comp...

Please sign up or login with your details

Forgot password? Click here to reset