RC-QED: Evaluating Natural Language Derivations in Multi-Hop Reading Comprehension

10/10/2019

∙

Recent studies revealed that reading comprehension (RC) systems learn to exploit annotation artifacts and other biases in current datasets. This allows systems to "cheat" by employing simple heuristics to answer questions, e.g. by relying on semantic type consistency. This means that current datasets are not well-suited to evaluate RC systems. To address this issue, we introduce RC-QED, a new RC task that requires giving not only the correct answer to a question, but also the reasoning employed for arriving at this answer. For this, we release a large benchmark dataset consisting of 12,000 answers and corresponding reasoning in form of natural language derivations. Experiments show that our benchmark is robust to simple heuristics and challenging for state-of-the-art neural path ranking approaches.

READ FULL TEXT

RC-QED: Evaluating Natural Language Derivations in Multi-Hop Reading Comprehension

Sign in with Google

Consider DeepAI Pro