I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection

08/08/2023
by   Yongzhu Chang, et al.
0

Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on Word replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2020

Text Data Augmentation: Towards better detection of spear-phishing emails

Text data augmentation, i.e. the creation of synthetic textual data from...
research
11/02/2020

An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

One critical issue of zero anaphora resolution (ZAR) is the scarcity of ...
research
05/12/2022

TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding

Data augmentation is an effective approach to tackle over-fitting. Many ...
research
10/04/2020

Reverse Operation based Data Augmentation for Solving Math Word Problems

Automatically solving math word problems is a critical task in the field...
research
07/12/2022

Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique

We present an efficient framework of corpus for sign language translatio...
research
06/02/2023

Exploring semantic information in disease: Simple Data Augmentation Techniques for Chinese Disease Normalization

The disease is a core concept in the medical field, and the task of norm...
research
07/16/2019

Neural Language Model Based Training Data Augmentation for Weakly Supervised Early Rumor Detection

The scarcity and class imbalance of training data are known issues in cu...

Please sign up or login with your details

Forgot password? Click here to reset