When does data augmentation help generalization in NLP?

04/30/2020
by   Rohan Jha, et al.
0

Neural models often exploit superficial ("weak") features to achieve good performance, rather than deriving the more general ("strong") features that we'd prefer a model to use. Overcoming this tendency is a central challenge in areas such as representation learning and ML fairness. Recent work has proposed using data augmentation–that is, generating training examples on which these weak features fail–as a means of encouraging models to prefer the stronger features. We design a series of toy learning problems to investigate the conditions under which such data augmentation is helpful. We show that augmenting with training examples on which the weak feature fails ("counterexamples") does succeed in preventing the model from relying on the weak feature, but often does not succeed in encouraging the model to use the stronger feature in general. We also find in many cases that the number of counterexamples needed to reach a given error rate is independent of the amount of training data, and that this type of data augmentation becomes less effective as the target strong feature becomes harder to learn.

READ FULL TEXT

page 7

page 17

research
05/29/2018

Improved Mixed-Example Data Augmentation

In order to reduce overfitting, neural networks are typically trained wi...
research
04/24/2020

G-DAUG: Generative Data Augmentation for Commonsense Reasoning

Recent advances in commonsense reasoning depend on large-scale human-ann...
research
06/11/2018

Data augmentation instead of explicit regularization

Modern deep artificial neural networks have achieved impressive results ...
research
03/03/2022

Data Augmentation as Feature Manipulation: a story of desert cows and grass cows

Data augmentation is a cornerstone of the machine learning pipeline, yet...
research
10/08/2020

Learning to Recombine and Resample Data for Compositional Generalization

Flexible neural models outperform grammar- and automaton-based counterpa...
research
11/12/2019

Learning from Data-Rich Problems: A Case Study on Genetic Variant Calling

Next Generation Sequencing can sample the whole genome (WGS) or the 1-2 ...
research
05/28/2017

Learning Data Manifolds with a Cutting Plane Method

We consider the problem of classifying data manifolds where each manifol...

Please sign up or login with your details

Forgot password? Click here to reset