EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

by   Jason W. Wei, et al.

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50 accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.


page 1

page 2

page 3

page 4


Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

Few-shot text classification is a fundamental NLP task in which a model ...

AEDA: An Easier Data Augmentation Technique for Text Classification

This paper proposes AEDA (An Easier Data Augmentation) technique to help...

You Only Cut Once: Boosting Data Augmentation with a Single Cut

We present You Only Cut Once (YOCO) for performing data augmentations. Y...

Concept Matching for Low-Resource Classification

We propose a model to tackle classification tasks in the presence of ver...

ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification

Data augmentation has been an important ingredient for boosting performa...

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

We introduce a data augmentation technique based on byte pair encoding a...

On the Generalization Effects of Linear Transformations in Data Augmentation

Data augmentation is a powerful technique to improve performance in appl...