LIQUID: A Framework for List Question Answering Dataset Generation

02/03/2023
by   Seongyun Lee, et al.
0

Question answering (QA) models often rely on large-scale training datasets, which necessitates the development of a data generation framework to reduce the cost of manual annotations. Although several recent studies have aimed to generate synthetic questions with single-span answers, no study has been conducted on the creation of list questions with multiple, non-contiguous spans as answers. To address this gap, we propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora. We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers. This allows us to select answers that are semantically correlated in context and is, therefore, suitable for constructing list questions. We then create questions using an off-the-shelf question generator with the extracted entities and original passage. Finally, iterative filtering and answer expansion are performed to ensure the accuracy and completeness of the answers. Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

QAMPARI: : An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

Existing benchmarks for open-domain question answering (ODQA) typically ...
research
10/08/2019

Generating Highly Relevant Questions

The neural seq2seq based question generation (QG) is prone to generating...
research
09/11/2021

What's in a Name? Answer Equivalence For Open-Domain Question Answering

A flaw in QA evaluation is that annotations often only provide one gold ...
research
10/22/2021

ListReader: Extracting List-form Answers for Opinion Questions

Question answering (QA) is a high-level ability of natural language proc...
research
06/12/2019

Synthetic QA Corpora Generation with Roundtrip Consistency

We introduce a novel method of generating synthetic question answering c...
research
09/29/2020

Sequence-to-Sequence Learning for Indonesian Automatic Question Generator

Automatic question generation is defined as the task of automating the c...
research
10/04/2020

When in Doubt, Ask: Generating Answerable and Unanswerable Questions, Unsupervised

Question Answering (QA) is key for making possible a robust communicatio...

Please sign up or login with your details

Forgot password? Click here to reset