DeepAI AI Chat
Log In Sign Up

Iterative Decoding for Compositional Generalization in Transformers

by   Luana Ruiz, et al.
University of Pennsylvania

Deep learning models do well at generalizing to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In particular, in sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for even marginally longer examples than those seen during training. This paper introduces iterative decoding, an alternative to seq2seq learning that (i) improves transformer compositional generalization and (ii) evidences that, in general, seq2seq transformers do not learn iterations that are not unrolled. Inspired by the idea of compositionality – that complex tasks can be solved by composing basic primitives – training examples are broken down into a sequence of intermediate steps that the transformer then learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. Through numerical experiments, we show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset, and solve the problem of calculating Cartesian products between vectors longer than those seen during training with 100 been shown to fail. We also illustrate a limitation of iterative decoding, specifically, that it can make sorting harder to learn on the CFQ dataset.


page 5

page 13


Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...

Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

Compositional generalization is a troubling blind spot for neural langua...

What is my math transformer doing? – Three results on interpretability and generalization

This paper investigates the failure cases and out-of-distribution behavi...

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Out-of-distribution generalization (OODG) is a longstanding challenge fo...

Few-shot Sequence Learning with Transformers

Few-shot algorithms aim at learning new tasks provided only a handful of...

Neural Execution Engines: Learning to Execute Subroutines

A significant effort has been made to train neural networks that replica...