MuSiQue: Multi-hop Questions via Single-hop Question Composition

08/02/2021
by   Harsh Trivedi, et al.
0

To build challenging multi-hop question answering datasets, we propose a bottom-up semi-automatic process of constructing multi-hop question via composition of single-hop questions. Constructing multi-hop questions as composition of single-hop questions allows us to exercise greater control over the quality of the resulting multi-hop questions. This process allows building a dataset with (i) connected reasoning where each step needs the answer from a previous step; (ii) minimal train-test leakage by eliminating even partial overlap of reasoning steps; (iii) variable number of hops and composition structures; and (iv) contrasting unanswerable questions by modifying the context. We use this process to construct a new multihop QA dataset: MuSiQue-Ans with  25K 2-4 hop questions using seed questions from 5 existing single-hop datasets. Our experiments demonstrate that MuSique is challenging for state-of-the-art QA models (e.g., human-machine gap of 30 F1 pts), significantly harder than existing datasets (2x human-machine gap), and substantially less cheatable (e.g., a single-hop model is worse by 30 F1 pts). We also build an even more challenging dataset, MuSiQue-Full, consisting of answerable and unanswerable contrast question pairs, where model performance drops further by 13+ F1 pts. For data and code, see <https://github.com/stonybrooknlp/musique>.

READ FULL TEXT

page 19

page 20

research
11/02/2020

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

A multi-hop question answering (QA) dataset aims to test reasoning and i...
research
02/23/2020

Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?

Multi-hop question answering (QA) requires a model to retrieve and integ...
research
08/12/2023

Performance Prediction for Multi-hop Questions

We study the problem of Query Performance Prediction (QPP) for open-doma...
research
10/11/2022

How Well Do Multi-hop Reading Comprehension Models Understand Date Information?

Several multi-hop reading comprehension datasets have been proposed to r...
research
10/26/2021

Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable

Multi-hop QA requires the machine to answer complex questions through fi...
research
05/25/2022

LEPUS: Prompt-based Unsupervised Multi-hop Reranking for Open-domain QA

We study unsupervised multi-hop reranking for multi-hop QA (MQA) with op...
research
07/01/2023

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

Recent generative approaches for multi-hop question answering (QA) utili...

Please sign up or login with your details

Forgot password? Click here to reset