A Diversity-Enhanced and Constraints-Relaxed Augmentation for Low-Resource Classification

09/24/2021
by   Guang Liu, et al.
0

Data augmentation (DA) aims to generate constrained and diversified data to improve classifiers in Low-Resource Classification (LRC). Previous studies mostly use a fine-tuned Language Model (LM) to strengthen the constraints but ignore the fact that the potential of diversity could improve the effectiveness of generated data. In LRC, strong constraints but weak diversity in DA result in the poor generalization ability of classifiers. To address this dilemma, we propose a Diversity-Enhanced and Constraints-{Relaxed Augmentation (DECRA). Our DECRA has two essential components on top of a transformer-based backbone model. 1) A k-beta augmentation, an essential component of DECRA, is proposed to enhance the diversity in generating constrained data. It expands the changing scope and improves the degree of complexity of the generated data. 2) A masked language model loss, instead of fine-tuning, is used as a regularization. It relaxes constraints so that the classifier can be trained with more scattered generated data. The combination of these two components generates data that can reach or approach category boundaries and hence help the classifier generalize better. We evaluate our DECRA on three public benchmark datasets under low-resource settings. Extensive experiments demonstrate that our DECRA outperforms state-of-the-art approaches by 3.8 the overall score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution

In this paper, we investigate the driving factors behind concatenation, ...
research
05/16/2023

AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime

Data augmentation is widely used in text classification, especially in t...
research
08/15/2022

Syntax-driven Data Augmentation for Named Entity Recognition

In low resource settings, data augmentation strategies are commonly leve...
research
10/16/2022

A Policy-based Approach to the SpecAugment Method for Low Resource E2E ASR

SpecAugment is a very effective data augmentation method for both HMM an...
research
04/15/2023

Robust Educational Dialogue Act Classifiers with Low-Resource and Imbalanced Datasets

Dialogue acts (DAs) can represent conversational actions of tutors or st...
research
09/01/2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the ov...
research
06/06/2022

Global Mixup: Eliminating Ambiguity with Clustering

Data augmentation with Mixup has been proven an effective method to regu...

Please sign up or login with your details

Forgot password? Click here to reset