ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification

12/16/2021
by   Hazel Kim, et al.
0

Data augmentation has been an important ingredient for boosting performances of learned models. Prior data augmentation methods for few-shot text classification have led to great performance boosts. However, they have not been designed to capture the intricate compositional structure of natural language. As a result, they fail to generate samples with plausible and diverse sentence structures. Motivated by this, we present the data Augmentation using Lexicalized Probabilistic context-free grammars (ALP) that generates augmented samples with diverse syntactic structures with plausible grammar. The lexicalized PCFG parse trees consider both the constituents and dependencies to produce a syntactic frame that maximizes a variety of word choices in a syntactically preservable manner without specific domain experts. Experiments on few-shot text classification tasks demonstrate that ALP enhances many state-of-the-art classification methods. As a second contribution, we delve into the train-val splitting methodologies when a data augmentation method comes into play. We argue empirically that the traditional splitting of training and validation sets is sub-optimal compared to our novel augmentation-based splitting strategies that further expand the training split with the same number of labeled data. Taken together, our contributions on the data augmentation strategies yield a strong training recipe for few-shot text classification tasks.

READ FULL TEXT
research
03/12/2021

Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

Few-shot text classification is a fundamental NLP task in which a model ...
research
05/22/2023

Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks

Text classification tasks often encounter few shot scenarios with limite...
research
05/12/2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many ...
research
08/30/2021

AEDA: An Easier Data Augmentation Technique for Text Classification

This paper proposes AEDA (An Easier Data Augmentation) technique to help...
research
10/31/2021

PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

While Out-of-distribution (OOD) detection has been well explored in comp...
research
05/28/2021

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

In Natural Language Processing (NLP), finding data augmentation techniqu...
research
09/01/2021

Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification

Data augmentation aims to enrich training samples for alleviating the ov...

Please sign up or login with your details

Forgot password? Click here to reset