DeepAI AI Chat
Log In Sign Up

Iterative Decoding for Compositional Generalization in Transformers

10/08/2021
by   Luana Ruiz, et al.
Google
University of Pennsylvania
0

Deep learning models do well at generalizing to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In particular, in sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for even marginally longer examples than those seen during training. This paper introduces iterative decoding, an alternative to seq2seq learning that (i) improves transformer compositional generalization and (ii) evidences that, in general, seq2seq transformers do not learn iterations that are not unrolled. Inspired by the idea of compositionality – that complex tasks can be solved by composing basic primitives – training examples are broken down into a sequence of intermediate steps that the transformer then learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. Through numerical experiments, we show that transfomers trained via iterative decoding outperform their seq2seq counterparts on the PCFG dataset, and solve the problem of calculating Cartesian products between vectors longer than those seen during training with 100 been shown to fail. We also illustrate a limitation of iterative decoding, specifically, that it can make sorting harder to learn on the CFQ dataset.

READ FULL TEXT

page 5

page 13

10/02/2022

Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks

Transformer networks have seen great success in natural language process...
01/27/2022

Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

Compositional generalization is a troubling blind spot for neural langua...
10/31/2022

What is my math transformer doing? – Three results on interpretability and generalization

This paper investigates the failure cases and out-of-distribution behavi...
09/15/2021

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Incremental processing allows interactive systems to respond based on pa...
10/07/2022

Out-of-Distribution Generalization in Algorithmic Reasoning Through Curriculum Learning

Out-of-distribution generalization (OODG) is a longstanding challenge fo...
12/17/2020

Few-shot Sequence Learning with Transformers

Few-shot algorithms aim at learning new tasks provided only a handful of...
06/15/2020

Neural Execution Engines: Learning to Execute Subroutines

A significant effort has been made to train neural networks that replica...