Harvesting Paragraph-Level Question-Answer Pairs from Wikipedia

05/15/2018
by   Xinya Du, et al.
0

We study the task of generating from Wikipedia articles question-answer pairs that cover content beyond a single sentence. We propose a neural network approach that incorporates coreference knowledge via a novel gating mechanism. Compared to models that only take into account sentence-level information (Heilman and Smith, 2010; Du et al., 2017; Zhou et al., 2017), we find that the linguistic knowledge introduced by the coreference representation aids question generation significantly, producing models that outperform the current state-of-the-art. We apply our system (composed of an answer span extraction system and the passage-level QG system) to the 10,000 top-ranking Wikipedia articles and create a corpus of over one million question-answer pairs. We also provide a qualitative analysis for this large-scale generated corpus from Wikipedia.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2020

Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus

The ability to ask questions is important in both human and machine inte...
research
12/26/2018

DBpedia NIF: Open, Large-Scale and Multilingual Knowledge Extraction Corpus

In the past decade, the DBpedia community has put significant amount of ...
research
05/30/2023

SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

Text simplification research has mostly focused on sentence-level simpli...
research
03/31/2017

Reading Wikipedia to Answer Open-Domain Questions

This paper proposes to tackle open- domain question answering using Wiki...
research
08/28/2018

Learning To Split and Rephrase From Wikipedia Edit History

Split and rephrase is the task of breaking down a sentence into shorter ...
research
12/29/2020

Generating Wikipedia Article Sections from Diverse Data Sources

Datasets for data-to-text generation typically focus either on multi-dom...
research
06/24/2020

WikipediaBot: Automated Adversarial Manipulation of Wikipedia Articles

This paper presents an automated adversarial mechanism called WikipediaB...

Please sign up or login with your details

Forgot password? Click here to reset