DeepAI AI Chat
Log In Sign Up

RC-QED: Evaluating Natural Language Derivations in Multi-Hop Reading Comprehension

10/10/2019
by   Naoya Inoue, et al.
Tohoku University
UCL
0

Recent studies revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This allows systems to "cheat" by employing simple heuristics to answer questions, e.g. by relying on semantic type consistency. This means that current datasets are not well-suited to evaluate RC systems. To address this issue, we introduce RC-QED, a new RC task that requires giving not only the correct answer to a question, but also the reasoning employed for arriving at this answer. For this, we release a large benchmark dataset consisting of 12,000 answers and corresponding reasoning in form of natural language derivations. Experiments show that our benchmark is robust to simple heuristics and challenging for state-of-the-art neural path ranking approaches.

READ FULL TEXT
04/02/2020

R3: A Reading Comprehension Benchmark Requiring Reasoning Processes

Existing question answering systems can only predict answers without exp...
03/22/2022

VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension

One of the emerging research trends in natural language understanding is...
03/12/2022

What Makes Reading Comprehension Questions Difficult?

For a natural language understanding benchmark to be useful in research,...
10/22/2021

Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark

We focus on Multimodal Machine Reading Comprehension (M3C) where a model...
11/01/2022

CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

The full power of human language-based communication cannot be realized ...
07/06/2023

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Deep text understanding, which requires the connections between a given ...
07/18/2022

MRCLens: an MRC Dataset Bias Detection Toolkit

Many recent neural models have shown remarkable empirical results in Mac...