Selective Text Augmentation with Word Roles for Low-Resource Text Classification

09/04/2022
by   Biyang Guo, et al.
0

Data augmentation techniques are widely used in text classification tasks to improve the performance of classifiers, especially in low-resource scenarios. Most previous methods conduct text augmentation without considering the different functionalities of the words in the text, which may generate unsatisfactory samples. Different words may play different roles in text classification, which inspires us to strategically select the proper roles for text augmentation. In this work, we first identify the relationships between the words in a text and the text category from the perspectives of statistical correlation and semantic similarity and then utilize them to divide the words into four roles – Gold, Venture, Bonus, and Trivial words, which have different functionalities for text classification. Based on these word roles, we present a new augmentation technique called STA (Selective Text Augmentation) where different text-editing operations are selectively applied to words with specific roles. STA can generate diverse and relatively clean samples, while preserving the original core semantics, and is also quite simple to implement. Extensive experiments on 5 benchmark low-resource text classification datasets illustrate that augmented samples produced by STA successfully boost the performance of classification models which significantly outperforms previous non-selective methods, including two large language model-based techniques. Cross-dataset experiments further indicate that STA can help the classifiers generalize better to other datasets than previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

What Have Been Learned What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

Text augmentation techniques are widely used in text classification prob...
research
05/16/2023

AdversarialWord Dilution as Text Data Augmentation in Low-Resource Regime

Data augmentation is widely used in text classification, especially in t...
research
09/01/2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the ov...
research
12/05/2020

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Data augmentation is proven to be effective in many NLU tasks, especiall...
research
08/30/2021

AEDA: An Easier Data Augmentation Technique for Text Classification

This paper proposes AEDA (An Easier Data Augmentation) technique to help...
research
07/07/2019

Improving short text classification through global augmentation methods

We study the effect of different approaches to text augmentation. To do ...
research
05/16/2018

Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

We propose a novel data augmentation for labeled sentences called contex...

Please sign up or login with your details

Forgot password? Click here to reset