Data Augmentation Approaches in Natural Language Processing: A Survey

10/05/2021
by   Bohan Li, et al.
0

As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks. One of the main focuses of the DA methods is to improve the diversity of training data, thereby helping the model to better generalize to unseen testing data. In this survey, we frame DA methods into three categories based on the diversity of augmented data, including paraphrasing, noising, and sampling. Our paper sets out to analyze DA methods in detail according to the above categories. Further, we also introduce their applications in NLP tasks as well as the challenges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

In recent years, language models (LMs) have made remarkable progress in ...
research
11/29/2021

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Data augmentation (DA) is a common solution to data scarcity and imbalan...
research
01/02/2021

Substructure Substitution: Structured Data Augmentation for NLP

We study a family of data augmentation methods, substructure substitutio...
research
04/24/2022

EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification

Recent works have empirically shown the effectiveness of data augmentati...
research
10/10/2022

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Many natural language processing (NLP) tasks are naturally imbalanced, a...
research
07/21/2021

An overview of mixing augmentation methods and augmentation strategies

Deep Convolutional Neural Networks have made an incredible progress in m...
research
10/10/2022

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective

Data augmentation (DA) is a powerful workhorse for bolstering performanc...

Please sign up or login with your details

Forgot password? Click here to reset