How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

by   Shayne Longpre, et al.

Task-agnostic forms of data augmentation have proven widely effective in computer vision, even on pretrained models. In NLP similar results are reported most commonly for low data regimes, non-pretrained models, or situationally for pretrained models. In this paper we ask how effective these techniques really are when applied to pretrained transformers. Using two popular varieties of task-agnostic data augmentation (not tailored to any particular task), Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrichet al., 2015), we conduct a systematic examination of their effects across 5 classification tasks, 6 datasets, and 3 variants of modern pretrained transformers, including BERT, XLNet, and RoBERTa. We observe a negative result, finding that techniques which previously reported strong improvements for non-pretrained models fail to consistently improve performance for pretrained transformers, even when training data is limited. We hope this empirical analysis helps inform practitioners where data augmentation techniques may confer improvements.


page 1

page 2

page 3

page 4


Pretrained Transformers Improve Out-of-Distribution Robustness

Although pretrained Transformers such as BERT achieve high accuracy on i...

An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

One critical issue of zero anaphora resolution (ZAR) is the scarcity of ...

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

Vision Transformers (ViT) have been shown to attain highly competitive p...

Contrastive Out-of-Distribution Detection for Pretrained Transformers

Pretrained transformers achieve remarkable performance when the test dat...

A task in a suit and a tie: paraphrase generation with semantic augmentation

Paraphrasing is rooted in semantics. We show the effectiveness of transf...

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

NLP has achieved great progress in the past decade through the use of ne...

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

We investigate the robustness of vision transformers (ViTs) through the ...

Code Repositories


Data Augmentation for NLP. NLP数据增强

view repo