Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

11/28/2022
by   Yichen Jiang, et al.
11

Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. In this work, we analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias (i.e., a source sequence already mapped to a target sequence is less likely to be mapped to other target sequences), and the tendency to memorize whole examples rather than separating structures from contents. We propose two techniques to address these two issues respectively: Mutual Exclusivity Training that prevents the model from producing seen generations when facing novel, unseen examples via an unlikelihood-based loss; and prim2primX data augmentation that automatically diversifies the arguments of every syntactic function to prevent memorizing and provide a compositional inductive bias without exposing test-set data. Combining these two techniques, we show substantial empirical improvements using standard sequence-to-sequence models (LSTMs and Transformers) on two widely-used compositionality datasets: SCAN and COGS. Finally, we provide analysis characterizing the improvements as well as the remaining challenges, and provide detailed ablations of our method. Our code is available at https://github.com/owenzx/met-primaug

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Learning to Substitute Spans towards Improving Compositional Generalization

Despite the rising prevalence of neural sequence models, recent empirica...
research
11/18/2020

Sequence-Level Mixed Sample Data Augmentation

Despite their empirical success, neural networks still have difficulty c...
research
09/30/2021

Inducing Transformer's Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks

Systematic compositionality is an essential mechanism in human language,...
research
04/21/2019

Good-Enough Compositional Data Augmentation

We propose a simple data augmentation protocol aimed at providing a comp...
research
10/09/2021

Disentangled Sequence to Sequence Learning for Compositional Generalization

There is mounting evidence that existing neural network models, in parti...
research
09/12/2018

Jump to better conclusions: SCAN both left and right

Lake and Baroni (2018) recently introduced the SCAN data set, which cons...
research
12/06/2021

Scaling Up Influence Functions

We address efficient calculation of influence functions for tracking pre...

Please sign up or login with your details

Forgot password? Click here to reset