QASnowball: An Iterative Bootstrapping Framework for High-Quality Question-Answering Data Generation

09/19/2023
by   Kunlun Zhu, et al.
0

Recent years have witnessed the success of question answering (QA), especially its potential to be a foundation paradigm for tackling diverse NLP tasks. However, obtaining sufficient data to build an effective and stable QA system still remains an open problem. For this problem, we introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball), which can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples. Specifically, QASnowball consists of three modules, an answer extractor to extract core phrases in unlabeled documents as candidate answers, a question generator to generate questions based on documents and candidate answers, and a QA data filter to filter out high-quality QA data. Moreover, QASnowball can be self-enhanced by reseeding the seed set to fine-tune itself in different iterations, leading to continual improvements in the generation quality. We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models: (1) training models on the generated data achieves comparable results to using supervised data, and (2) pre-training on the generated data and fine-tuning on supervised data can achieve better performance. Our code and generated data will be released to advance further work.

READ FULL TEXT
research
06/12/2019

Unsupervised Question Answering by Cloze Translation

Obtaining training data for Question Answering (QA) is time-consuming an...
research
11/24/2022

Question Answering and Question Generation for Finnish

Recent advances in the field of language modeling have improved the stat...
research
05/09/2022

Few-shot Mining of Naturally Occurring Inputs and Outputs

Creating labeled natural language training data is expensive and require...
research
11/18/2021

How to Build Robust FAQ Chatbot with Controllable Question Generator?

Many unanswerable adversarial questions fool the question-answer (QA) sy...
research
05/06/2020

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

Question Answering (QA) has shown great success thanks to the availabili...
research
04/14/2019

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

Recently, a simple combination of passage retrieval using off-the-shelf ...
research
08/23/2022

Unsupervised Question Answering via Answer Diversifying

Unsupervised question answering is an attractive task due to its indepen...

Please sign up or login with your details

Forgot password? Click here to reset