What Have Been Learned What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification

09/01/2021
by   Biyang Guo, et al.
0

Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment the text in a non-selective manner, which means the less important or noisy words have the same chances to be augmented as the informative words, and thereby limits the performance of augmentation. In this work, we systematically summarize three kinds of role keywords, which have different functions for text classification, and design effective methods to extract them from the text. Based on these extracted role keywords, we propose STA (Selective Text Augmentation) to selectively augment the text, where the informative, class-indicating words are emphasized but the irrelevant or noisy words are diminished. Extensive experiments on four English and Chinese text classification benchmark datasets demonstrate that STA can substantially outperform the non-selective text augmentation methods.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/30/2021

AEDA: An Easier Data Augmentation Technique for Text Classification

This paper proposes AEDA (An Easier Data Augmentation) technique to help...
12/05/2020

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Data augmentation is proven to be effective in many NLU tasks, especiall...
03/21/2019

Low Resource Text Classification with ULMFit and Backtranslation

In computer vision, virtually every state of the art deep learning syste...
02/01/2019

tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification

The use of background knowledge remains largely unexploited in many text...
07/07/2019

Improving short text classification through global augmentation methods

We study the effect of different approaches to text augmentation. To do ...
09/01/2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the ov...
10/25/2017

Re-evaluating the need for Modelling Term-Dependence in Text Classification Problems

A substantial amount of research has been carried out in developing mach...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.