Good-Enough Compositional Data Augmentation

04/21/2019
by   Jacob Andreas, et al.
0

We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training examples and replacing (possibly discontinuous) fragments with other fragments that appear in at least one similar environment. The protocol is model-agnostic and useful for a variety of tasks. Applied to neural sequence-to-sequence models, it reduces relative error rate by up to 87 problems from the diagnostic SCAN tasks and 16 Applied to n-gram language modeling, it reduces perplexity by roughly 1 small datasets in several languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2021

Good-Enough Example Extrapolation

This paper asks whether extrapolating the hidden space distribution of t...
research
10/08/2020

Learning to Recombine and Resample Data for Compositional Generalization

Flexible neural models outperform grammar- and automaton-based counterpa...
research
05/03/2022

SUBS: Subtree Substitution for Compositional Semantic Parsing

Although sequence-to-sequence models often achieve good performance in s...
research
11/18/2020

Sequence-Level Mixed Sample Data Augmentation

Despite their empirical success, neural networks still have difficulty c...
research
11/28/2022

Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality

Recent datasets expose the lack of the systematic generalization ability...
research
06/12/2023

Gender-Inclusive Grammatical Error Correction through Augmentation

In this paper we show that GEC systems display gender bias related to th...
research
05/25/2022

Conditional set generation using Seq2seq models

Conditional set generation learns a mapping from an input sequence of to...

Please sign up or login with your details

Forgot password? Click here to reset