An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

11/02/2020
by   Ryuto Konno, et al.
0

One critical issue of zero anaphora resolution (ZAR) is the scarcity of labeled data. This study explores how effectively this problem can be alleviated by data augmentation. We adopt a state-of-the-art data augmentation method, called the contextual data augmentation (CDA), that generates labeled training instances using a pretrained language model. The CDA has been reported to work well for several other natural language processing tasks, including text classification and machine translation. This study addresses two underexplored issues on CDA, that is, how to reduce the computational cost of data augmentation and how to ensure the quality of the generated data. We also propose two methods to adapt CDA to ZAR: [MASK]-based augmentation and linguistically-controlled masking. Consequently, the experimental results on Japanese ZAR show that our methods contribute to both the accuracy gain and the computation cost reduction. Our closer analysis reveals that the proposed method can improve the quality of the augmented training data when compared to the conventional CDA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2020

Text Data Augmentation: Towards better detection of spear-phishing emails

Text data augmentation, i.e. the creation of synthetic textual data from...
research
06/12/2022

Data Augmentation for Intent Classification

Training accurate intent classifiers requires labeled data, which can be...
research
09/20/2021

Data Augmentation Methods for Anaphoric Zero Pronouns

In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, a...
research
08/08/2023

I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection

Simile detection is a valuable task for many natural language processing...
research
02/24/2023

STA: Self-controlled Text Augmentation for Improving Text Classifications

Despite recent advancements in Machine Learning, many tasks still involv...
research
05/14/2020

Data Augmentation for Deep Candlestick Learner

To successfully build a deep learning model, it will need a large amount...
research
09/16/2021

Sister Help: Data Augmentation for Frame-Semantic Role Labeling

While FrameNet is widely regarded as a rich resource of semantics in nat...

Please sign up or login with your details

Forgot password? Click here to reset