Iterative Decoding for Compositional Generalization in Transformers

by   Luana Ruiz, et al.

Deep learning models do well at generalizing to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In particular, in sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for even marginally longer examples than those seen during training. This paper introduces iterative decoding, an alternative to seq2seq learning that (i) improves transformer compositional generalization and (ii) evidences that, in general, seq2seq transformers do not learn iterations that are not unrolled. Inspired by the idea of compositionality – that complex tasks can be solved by composing basic primitives – training examples are broken down into a sequence of intermediate steps that the transformer then learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. Through numerical experiments, we show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset, and solve the problem of calculating Cartesian products between vectors longer than those seen during training with 100 been shown to fail. We also illustrate a limitation of iterative decoding, specifically, that it can make sorting harder to learn on the CFQ dataset.


page 5

page 13


Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...

Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

Compositional generalization is a troubling blind spot for neural langua...

Making Transformers Solve Compositional Tasks

Several studies have reported the inability of Transformer models to gen...

What is my math transformer doing? – Three results on interpretability and generalization

This paper investigates the failure cases and out-of-distribution behavi...

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...

Can transformers learn the greatest common divisor?

I investigate the capability of small transformers to compute the greate...

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...

Please sign up or login with your details

Forgot password? Click here to reset