Automatically Learning Data Augmentation Policies for Dialogue Tasks

09/27/2019
by   Tong Niu, et al.
25

Automatic data augmentation (AutoAugment) (Cubuk et al., 2019) searches for optimal perturbation policies via a controller trained using performance rewards of a sampled policy on the target task, hence reducing data-level model bias. While being a powerful algorithm, their work has focused on computer vision tasks, where it is comparatively easy to apply imperceptible perturbations without changing an image's semantic meaning. In our work, we adapt AutoAugment to automatically discover effective perturbation policies for natural language processing (NLP) tasks such as dialogue generation. We start with a pool of atomic operations that apply subtle semantic-preserving perturbations to the source inputs of a dialogue task (e.g., different POS-tag types of stopword dropout, grammatical errors, and paraphrasing). Next, we allow the controller to learn more complex augmentation policies by searching over the space of the various combinations of these atomic operations. Moreover, we also explore conditioning the controller on the source inputs of the target task, since certain strategies may not apply to inputs that do not contain that strategy's required linguistic features. Empirically, we demonstrate that both our input-agnostic and input-aware controllers discover useful data augmentation policies, and achieve significant improvements over the previous state-of-the-art, including trained on manually-designed policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
11/12/2018

Learning data augmentation policies using augmented random search

Previous attempts for data augmentation are designed manually, and the a...
research
12/12/2022

RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

This paper presents a new data augmentation algorithm for natural unders...
research
07/17/2020

OnlineAugment: Online Data Augmentation with Less Domain Knowledge

Data augmentation is one of the most important tools in training modern ...
research
12/28/2021

LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

Despite the success of mixup in data augmentation, its applicability to ...
research
01/23/2020

Variational Hierarchical Dialog Autoencoder for Dialogue State Tracking Data Augmentation

Recent works have shown that generative data augmentation, where synthet...
research
03/25/2023

Deep Augmentation: Enhancing Self-Supervised Learning through Transformations in Higher Activation Space

We introduce Deep Augmentation, an approach to data augmentation using d...
research
07/15/2021

Tailor: Generating and Perturbing Text with Semantic Controls

Making controlled perturbations is essential for various tasks (e.g., da...

Please sign up or login with your details

Forgot password? Click here to reset