Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

10/04/2020
by   Dayiheng Liu, et al.
0

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the effectiveness of CRQDA

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

Improving Reading Comprehension Question Generation with Data Augmentation and Overgenerate-and-rank

Reading comprehension is a crucial skill in many aspects of education, i...
research
06/14/2019

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

Machine reading comprehension with unanswerable questions is a challengi...
research
04/10/2022

Data Augmentation for Biomedical Factoid Question Answering

We study the effect of seven data augmentation (da) methods in factoid q...
research
05/11/2021

Cross-Modal Generative Augmentation for Visual Question Answering

Data augmentation is an approach that can effectively improve the perfor...
research
06/23/2021

PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales

Pre-trained language models achieves high performance on machine reading...
research
06/14/2017

Neural Models for Key Phrase Detection and Question Generation

We propose a two-stage neural model to tackle question generation from d...
research
03/25/2020

Heavy-tailed Representations, Text Polarity Classification Data Augmentation

The dominant approaches to text representation in natural language rely ...

Please sign up or login with your details

Forgot password? Click here to reset