Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

09/01/2021
by   Shuhuai Ren, et al.
10

Data augmentation aims to enrich training samples for alleviating the overfitting issue in low-resource or class-imbalanced situations. Traditional methods first devise task-specific operations such as Synonym Substitute, then preset the corresponding parameters such as the substitution rate artificially, which require a lot of prior knowledge and are prone to fall into the sub-optimum. Besides, the number of editing operations is limited in the previous methods, which decreases the diversity of the augmented data and thus restricts the performance gain. To overcome the above limitations, we propose a framework named Text AutoAugment (TAA) to establish a compositional and learnable paradigm for data augmentation. We regard a combination of various operations as an augmentation policy and utilize an efficient Bayesian Optimization algorithm to automatically search for the best policy, which substantially improves the generalization capability of models. Experiments on six benchmark datasets show that TAA boosts classification accuracy in low-resource and class-imbalanced regimes by an average of 8.8 respectively, outperforming strong baselines.

READ FULL TEXT
research
09/04/2022

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Data augmentation techniques are widely used in text classification task...
research
05/16/2023

AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime

Data augmentation is widely used in text classification, especially in t...
research
05/05/2020

Establishing Baselines for Text Classification in Low-Resource Languages

While transformer-based finetuning techniques have proven effective in t...
research
07/11/2023

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition

Data augmentation has been widely used in low-resource NER tasks to tack...
research
12/16/2021

ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification

Data augmentation has been an important ingredient for boosting performa...
research
09/02/2022

Random Text Perturbations Work, but not Always

We present three large-scale experiments on binary text matching classif...
research
09/24/2021

A Diversity-Enhanced and Constraints-Relaxed Augmentation for Low-Resource Classification

Data augmentation (DA) aims to generate constrained and diversified data...

Please sign up or login with your details

Forgot password? Click here to reset