Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

06/13/2023
by   Zhengxiang Shi, et al.
0

In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10 setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2023

Don't Stop Pretraining? Make Prompt-based Fine-tuning Powerful Learner

Language models (LMs) trained on vast quantities of unlabelled data have...
research
10/05/2021

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcit...
research
05/19/2022

Transformers as Neural Augmentors: Class Conditional Sentence Generation via Variational Bayes

Data augmentation methods for Natural Language Processing tasks are expl...
research
09/13/2019

Sequence-to-sequence Pre-training with Data Augmentation for Sentence Rewriting

We study sequence-to-sequence (seq2seq) pre-training with data augmentat...
research
06/14/2021

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

NLP has achieved great progress in the past decade through the use of ne...
research
04/07/2022

The Effects of Regularization and Data Augmentation are Class Dependent

Regularization is a fundamental technique to prevent over-fitting and to...
research
11/29/2021

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Data augmentation (DA) is a common solution to data scarcity and imbalan...

Please sign up or login with your details

Forgot password? Click here to reset