DeepAI AI Chat
Log In Sign Up

RC-QED: Evaluating Natural Language Derivations in Multi-Hop Reading Comprehension

by   Naoya Inoue, et al.
Tohoku University

Recent studies revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This allows systems to "cheat" by employing simple heuristics to answer questions, e.g. by relying on semantic type consistency. This means that current datasets are not well-suited to evaluate RC systems. To address this issue, we introduce RC-QED, a new RC task that requires giving not only the correct answer to a question, but also the reasoning employed for arriving at this answer. For this, we release a large benchmark dataset consisting of 12,000 answers and corresponding reasoning in form of natural language derivations. Experiments show that our benchmark is robust to simple heuristics and challenging for state-of-the-art neural path ranking approaches.


R3: A Reading Comprehension Benchmark Requiring Reasoning Processes

Existing question answering systems can only predict answers without exp...

VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension

One of the emerging research trends in natural language understanding is...

What Makes Reading Comprehension Questions Difficult?

For a natural language understanding benchmark to be useful in research,...

Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To Benchmark

We focus on Multimodal Machine Reading Comprehension (M3C) where a model...

CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation

The full power of human language-based communication cannot be realized ...

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Deep text understanding, which requires the connections between a given ...

MRCLens: an MRC Dataset Bias Detection Toolkit

Many recent neural models have shown remarkable empirical results in Mac...