DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension

04/21/2018
by   Amrita Saha, et al.
0

We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the segments containing the answer in the other version. Further, since the two versions have different levels of plot detail, narration style, vocabulary, etc., answering questions from the second version requires deeper language understanding and incorporating external background knowledge. Additionally, the narrative style of passages arising from movie plots (as opposed to typical descriptive passages in existing datasets) exhibits the need to perform complex reasoning over events across multiple sentences. Indeed, we observe that state-of-the-art neural RC models which have achieved near human performance on the SQuAD dataset, even when coupled with traditional NLP techniques to address the challenges presented in DuoRC exhibit very poor performance (F1 score of 37.42 on DuoRC v/s 86 avenues wherein DuoRC could complement other RC datasets to explore novel neural approaches for studying language understanding.

READ FULL TEXT
research
12/19/2017

The NarrativeQA Reading Comprehension Challenge

Reading comprehension (RC)---in contrast to information retrieval---requ...
research
06/11/2018

Know What You Don't Know: Unanswerable Questions for SQuAD

Extractive reading comprehension systems can often locate the correct an...
research
11/13/2020

IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

Humans often have to read multiple documents to address their informatio...
research
11/01/2022

CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

The full power of human language-based communication cannot be realized ...
research
01/19/2021

Situation and Behavior Understanding by Trope Detection on Films

The human ability of deep cognitive skills are crucial for the developme...
research
08/16/2019

Reasoning Over Paragraph Effects in Situations

A key component of successfully reading a passage of text is the ability...
research
11/21/2019

Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets

Existing analysis work in machine reading comprehension (MRC) is largely...

Please sign up or login with your details

Forgot password? Click here to reset