Data Augmentation using Pre-trained Transformer Models

03/04/2020
by   Varun Kumar, et al.
0

Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2020

Mixup-Transfomer: Dynamic Data Augmentation for NLP Tasks

Mixup is the latest data augmentation technique that linearly interpolat...
research
03/12/2021

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

In this paper, we investigate whether the power of the models pre-traine...
research
10/23/2022

Automated Essay Scoring using Transformers

Despite being investigated for over five decades, the task of automated ...
research
12/04/2020

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation

Can AI help automate human-easy but computer-hard data preparation tasks...
research
02/04/2023

Semantic-Guided Image Augmentation with Pre-trained Models

Image augmentation is a common mechanism to alleviate data scarcity in c...
research
09/25/2020

A little goes a long way: Improving toxic language classification despite data scarcity

Detection of some types of toxic language is hampered by extreme scarcit...
research
01/20/2023

Data Augmentation for Modeling Human Personality: The Dexter Machine

Modeling human personality is important for several AI challenges, from ...

Please sign up or login with your details

Forgot password? Click here to reset