AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime

05/16/2023
by   Junfan Chen, et al.
0

Data augmentation is widely used in text classification, especially in the low-resource regime where a few examples for each class are available during training. Despite the success, generating data augmentations as hard positive examples that may increase their effectiveness is under-explored. This paper proposes an Adversarial Word Dilution (AWD) method that can generate hard positive examples as text data augmentations to train the low-resource text classification model efficiently. Our idea of augmenting the text data is to dilute the embedding of strong positive words by weighted mixing with unknown-word embedding, making the augmented inputs hard to be recognized as positive by the classification model. We adversarially learn the dilution weights through a constrained min-max optimization process with the guidance of the labels. Empirical studies on three benchmark datasets show that AWD can generate more effective data augmentations and outperform the state-of-the-art text data augmentation methods. The additional analysis demonstrates that the data augmentations generated by AWD are interpretable and can flexibly extend to new examples without further training.

READ FULL TEXT
research
09/04/2022

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Data augmentation techniques are widely used in text classification task...
research
03/21/2019

Low Resource Text Classification with ULMFit and Backtranslation

In computer vision, virtually every state of the art deep learning syste...
research
04/02/2019

Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text

Lemmatization aims to reduce the sparse data problem by relating the inf...
research
09/01/2021

What Have Been Learned What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification prob...
research
09/01/2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the ov...
research
04/27/2023

ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT

In this paper, we investigate the use of data obtained from prompting a ...
research
09/24/2021

A Diversity-Enhanced and Constraints-Relaxed Augmentation for Low-Resource Classification

Data augmentation (DA) aims to generate constrained and diversified data...

Please sign up or login with your details

Forgot password? Click here to reset