Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

by   Xanh Ho, et al.

A multi-hop question answering (QA) dataset aims to test reasoning and inference skills by requiring a model to read multiple paragraphs to answer a given question. However, current datasets do not provide a complete explanation for the reasoning process from the question to the answer. Further, previous studies revealed that many examples in existing multi-hop datasets do not require multi-hop reasoning to answer a question. In this study, we present a new multi-hop QA dataset, called 2WikiMultiHopQA, which uses structured and unstructured data. In our dataset, we introduce the evidence information containing a reasoning path for multi-hop questions. The evidence information has two benefits: (i) providing a comprehensive explanation for predictions and (ii) evaluating the reasoning skills of a model. We carefully design a pipeline and a set of templates when generating a question-answer pair that guarantees the multi-hop steps and the quality of the questions. We also exploit the structured format in Wikidata and use logical rules to create questions that are natural but still require multi-hop reasoning. Through experiments, we demonstrate that our dataset is challenging for multi-hop models and it ensures that multi-hop reasoning is required.


page 1

page 2

page 3

page 4


MuSiQue: Multi-hop Questions via Single-hop Question Composition

To build challenging multi-hop question answering datasets, we propose a...

StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts

Inferring spatial relations in natural language is a crucial ability an ...

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

A key limitation in current datasets for multi-hop reasoning is that the...

Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable

Multi-hop QA requires the machine to answer complex questions through fi...

Compositional Questions Do Not Necessitate Multi-hop Reasoning

Multi-hop reading comprehension (RC) questions are challenging because t...

Understanding Dataset Design Choices for Multi-hop Reasoning

Learning multi-hop reasoning has been a key challenge for reading compre...

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

Recently, there has been an increasing interest in building question ans...