Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies

01/06/2021
by   Mor Geva, et al.
0

A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly. In this work, we introduce StrategyQA, a question answering (QA) benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. A fundamental challenge in this setup is how to elicit such creative questions from crowdsourcing workers, while covering a broad range of potential strategies. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Moreover, we annotate each question with (1) a decomposition into reasoning steps for answering it, and (2) Wikipedia paragraphs that contain the answers to each step. Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs. Analysis shows that questions in StrategyQA are short, topic-diverse, and cover a wide range of strategies. Empirically, we show that humans perform well (87 on this task, while our best baseline reaches an accuracy of ∼66

READ FULL TEXT
research
04/28/2022

Inferring Implicit Relations with Language Models

A prominent challenge for modern language understanding systems is the a...
research
05/23/2023

Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database

In Textual question answering (TQA) systems, complex questions often req...
research
11/02/2020

Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps

A multi-hop question answering (QA) dataset aims to test reasoning and i...
research
04/16/2022

Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

Multi-hop Question Answering (QA) is a challenging task since it require...
research
10/22/2022

Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge

We propose a novel open-domain question answering (ODQA) framework for a...
research
08/05/2023

A criterion for Artificial General Intelligence: hypothetic-deductive reasoning, tested on ChatGPT

We argue that a key reasoning skill that any advanced AI, say GPT-4, sho...
research
11/07/2022

NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering

Hybrid tabular-textual question answering (QA) requires reasoning from h...

Please sign up or login with your details

Forgot password? Click here to reset