Substructure Substitution: Structured Data Augmentation for NLP

01/02/2021
by   Haoyue Shi, et al.
0

We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks. SUB2 generates new examples by substituting substructures (e.g., subtrees or subsequences) with ones with the same label, which can be applied to many structured NLP tasks such as part-of-speech tagging and parsing. For more general tasks (e.g., text classification) which do not have explicitly annotated substructures, we present variations of SUB2 based on constituency parse trees, introducing structure-aware data augmentation methods to general NLP tasks. For most cases, training with the augmented dataset by SUB2 achieves better performance than training with the original training set. Further experiments show that SUB2 has more consistent performance than other investigated augmentation methods, across different tasks and sizes of the seed dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

Data Augmentation Approaches in Natural Language Processing: A Survey

As an effective strategy, data augmentation (DA) alleviates data scarcit...
research
12/28/2021

LINDA: Unsupervised Learning to Interpolate in Natural Language Processing

Despite the success of mixup in data augmentation, its applicability to ...
research
06/21/2022

KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Few-Shot NLP

This paper focuses on text data augmentation for few-shot NLP tasks. The...
research
02/28/2015

The NLP Engine: A Universal Turing Machine for NLP

It is commonly accepted that machine translation is a more complex task ...
research
09/09/2021

Learning with Different Amounts of Annotation: From Zero to Many Labels

Training NLP systems typically assumes access to annotated data that has...
research
07/11/2023

Improved POS tagging for spontaneous, clinical speech using data augmentation

This paper addresses the problem of improving POS tagging of transcripts...
research
05/11/2023

KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment

Recent legislation of the "right to be forgotten" has led to the interes...

Please sign up or login with your details

Forgot password? Click here to reset