Exploring Data Augmentation Methods on Social Media Corpora

03/03/2023
by   Isabel Garcia Pietri, et al.
0

Data augmentation has proven widely effective in computer vision. In Natural Language Processing (NLP) data augmentation remains an area of active research. There is no widely accepted augmentation technique that works well across tasks and model architectures. In this paper we explore data augmentation techniques in the context of text classification using two social media datasets. We explore popular varieties of data augmentation, starting with oversampling, Easy Data Augmentation (Wei and Zou, 2019) and Back-Translation (Sennrich et al., 2015). We also consider Greyscaling, a relatively unexplored data augmentation technique that seeks to mitigate the intensity of adjectives in examples. Finally, we consider a few-shot learning approach: Pattern-Exploiting Training (PET) (Schick et al., 2020). For the experiments we use a BERT transformer architecture. Results show that augmentation techniques provide only minimal and inconsistent improvements. Synonym replacement provided evidence of some performance improvement and adjective scales with Grayscaling is an area where further exploration would be valuable. Few-shot learning experiments show consistent improvement over supervised training, and seem very promising when classes are easily separable but further exploration would be valuable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

HULAT at SemEval-2023 Task 10: Data augmentation for pre-trained transformers applied to the detection of sexism in social media

This paper describes our participation in SemEval-2023 Task 10, whose go...
research
10/05/2020

Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks

Mixup is the latest data augmentation technique that linearly interpolat...
research
10/05/2020

How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?

Task-agnostic forms of data augmentation have proven widely effective in...
research
04/10/2020

MA 3 : Model Agnostic Adversarial Augmentation for Few Shot learning

Despite the recent developments in vision-related problems using deep ne...
research
09/20/2023

AttentionMix: Data augmentation method that relies on BERT attention mechanism

The Mixup method has proven to be a powerful data augmentation technique...
research
10/31/2021

PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

While Out-of-distribution (OOD) detection has been well explored in comp...
research
05/17/2021

A Fusion-Denoising Attack on InstaHide with Data Augmentation

InstaHide is a state-of-the-art mechanism for protecting private trainin...

Please sign up or login with your details

Forgot password? Click here to reset