Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks

02/28/2022
by   Xing Wu, et al.
0

Before entering the neural network, a token is generally converted to the corresponding one-hot representation, which is a discrete distribution of the vocabulary. Smoothed representation is the probability of candidate tokens obtained from a pre-trained masked language model, which can be seen as a more informative substitution to the one-hot representation. We propose an efficient data augmentation method, termed text smoothing, by converting a sentence from its one-hot representation to a controllable smoothed representation. We evaluate text smoothing on different benchmarks in a low-resource regime. Experimental results show that text smoothing outperforms various mainstream data augmentation methods by a substantial margin. Moreover, text smoothing can be combined with those data augmentation methods to achieve better performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Text classification tasks often encounter few shot scenarios with limite...
research
10/06/2022

Augmentor or Filter? Reconsider the Role of Pre-trained Language Model in Text Classification Augmentation

Text augmentation is one of the most effective techniques to solve the c...
research
07/04/2020

Text Data Augmentation: Towards better detection of spear-phishing emails

Text data augmentation, i.e. the creation of synthetic textual data from...
research
04/18/2023

TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models

Data augmentation has been established as an efficacious approach to sup...
research
12/05/2020

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Data augmentation is proven to be effective in many NLU tasks, especiall...
research
10/21/2022

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Knowledge distillation is one of the primary methods of transferring kno...
research
05/21/2023

Contrastive Learning with Logic-driven Data Augmentation for Logical Reasoning over Text

Pre-trained large language model (LLM) is under exploration to perform N...

Please sign up or login with your details

Forgot password? Click here to reset