When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

03/17/2022
by   Ehsan Kamalloo, et al.
0

Data Augmentation (DA) is known to improve the generalizability of deep neural networks. Most existing DA techniques naively add a certain number of augmented samples without considering the quality and the added computational cost of these samples. To tackle this problem, a common strategy, adopted by several state-of-the-art DA methods, is to adaptively generate or re-weight augmented samples with respect to the task objective during training. However, these adaptive DA methods: (1) are computationally expensive and not sample-efficient, and (2) are designed merely for a specific setting. In this work, we present a universal DA technique, called Glitter, to overcome both issues. Glitter can be plugged into any DA method, making training sample-efficient without sacrificing performance. From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA. Without altering the training strategy, the task objective can be optimized on the selected subset. Our thorough experiments on the GLUE benchmark, SQuAD, and HellaSwag in three widely used training setups including consistency training, self-distillation and knowledge distillation reveal that Glitter is substantially faster to train and achieves a competitive performance, compared to strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2022

Universal Adaptive Data Augmentation

Existing automatic data augmentation (DA) methods either ignore updating...
research
05/25/2022

ReSmooth: Detecting and Utilizing OOD Samples when Training with Data Augmentation

Data augmentation (DA) is a widely used technique for enhancing the trai...
research
05/29/2022

A General Multiple Data Augmentation Based Framework for Training Deep Neural Networks

Deep neural networks (DNNs) often rely on massive labelled data for trai...
research
05/28/2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

In Natural Language Processing (NLP), finding data augmentation techniqu...
research
10/21/2021

DAIR: Data Augmented Invariant Regularization

While deep learning through empirical risk minimization (ERM) has succee...
research
07/18/2022

Rethinking Data Augmentation for Robust Visual Question Answering

Data Augmentation (DA) – generating extra training samples beyond origin...
research
02/21/2023

Evaluating the effect of data augmentation and BALD heuristics on distillation of Semantic-KITTI dataset

Active Learning (AL) has remained relatively unexplored for LiDAR percep...

Please sign up or login with your details

Forgot password? Click here to reset