A Survey on Data Augmentation for Text Classification

07/07/2021
by   Markus Bayer, et al.
0

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing the generalization capabilities of a model, it can also address many other challenges and problems, from overcoming a limited amount of training data over regularizing the objective to limiting the amount data used to protect privacy. Based on a precise description of the goals and applications of data augmentation (C1) and a taxonomy for existing works (C2), this survey is concerned with data augmentation methods for textual classification and aims to achieve a concise and comprehensive overview for researchers and practitioners (C3). Derived from the taxonomy, we divided more than 100 methods into 12 different groupings and provide state-of-the-art references expounding which methods are highly promising (C4). Finally, research perspectives that may constitute a building block for future work are given (C5).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2023

Performance of Data Augmentation Methods for Brazilian Portuguese Text Classification

Improving machine learning performance while increasing model generaliza...
research
02/17/2022

Graph Data Augmentation for Graph Machine Learning: A Survey

Data augmentation has recently seen increased interest in graph machine ...
research
04/26/2019

A Survey on Face Data Augmentation

The quality and size of training set have great impact on the results of...
research
07/18/2022

Research Trends and Applications of Data Augmentation Algorithms

In the Machine Learning research community, there is a consensus regardi...
research
03/29/2019

Informed Machine Learning - Towards a Taxonomy of Explicit Integration of Knowledge into Machine Learning

Despite the great successes of machine learning, it can have its limits ...
research
03/13/2023

Boosting Source Code Learning with Data Augmentation: An Empirical Study

The next era of program understanding is being propelled by the use of m...
research
01/21/2021

DataLoc+: A Data Augmentation Technique for Machine Learning in Room-Level Indoor Localization

Indoor localization has been a hot area of research over the past two de...

Please sign up or login with your details

Forgot password? Click here to reset