Not Enough Data? Deep Learning to the Rescue!

11/08/2019
by   Ateret Anaby-Tavor, et al.
0

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised classification tasks. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/18/2022

PromptDA: Label-guided Data Augmentation for Prompt-based Few Shot Learners

Recent advances on large pre-trained language models (PLMs) lead impress...
research
01/20/2023

Data Augmentation for Modeling Human Personality: The Dexter Machine

Modeling human personality is important for several AI challenges, from ...
research
12/17/2018

Conditional BERT Contextual Augmentation

We propose a novel data augmentation method for labeled sentences called...
research
12/19/2022

Less is More: Parameter-Free Text Classification with Gzip

Deep neural networks (DNNs) are often used for text classification tasks...
research
05/16/2018

Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations

We propose a novel data augmentation for labeled sentences called contex...
research
05/22/2023

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Text classification tasks often encounter few shot scenarios with limite...
research
09/25/2020

A little goes a long way: Improving toxic language classification despite data scarcity

Detection of some types of toxic language is hampered by extreme scarcit...

Please sign up or login with your details

Forgot password? Click here to reset