Iterative Decoding for Compositional Generalization in Transformers

10/08/2021
by   Luana Ruiz, et al.
0

Deep learning models do well at generalizing to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In particular, in sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for even marginally longer examples than those seen during training. This paper introduces iterative decoding, an alternative to seq2seq learning that (i) improves transformer compositional generalization and (ii) evidences that, in general, seq2seq transformers do not learn iterations that are not unrolled. Inspired by the idea of compositionality – that complex tasks can be solved by composing basic primitives – training examples are broken down into a sequence of intermediate steps that the transformer then learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. Through numerical experiments, we show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset, and solve the problem of calculating Cartesian products between vectors longer than those seen during training with 100 been shown to fail. We also illustrate a limitation of iterative decoding, specifically, that it can make sorting harder to learn on the CFQ dataset.

READ FULL TEXT

page 5

page 13

research
10/02/2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...
research
01/27/2022

Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

Compositional generalization is a troubling blind spot for neural langua...
research
08/09/2021

Making Transformers Solve Compositional Tasks

Several studies have reported the inability of Transformer models to gen...
research
10/31/2022

What is my math transformer doing? – Three results on interpretability and generalization

This paper investigates the failure cases and out-of-distribution behavi...
research
09/15/2021

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...
research
08/29/2023

Can transformers learn the greatest common divisor?

I investigate the capability of small transformers to compute the greate...
research
01/17/2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...

Please sign up or login with your details

Forgot password? Click here to reset