WikiWhy: Answering and Explaining Cause-and-Effect Questions

10/21/2022
by   Matthew Ho, et al.
2

As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7 human-evaluated correctness in the end-to-end answer explain condition, leaving significant room for future improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Existing question answering (QA) datasets fail to train QA systems to pe...
research
09/23/2021

BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles

A riddle is a question or statement with double or veiled meanings, foll...
research
03/18/2023

A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering

Recently, end-to-end trained models for multiple-choice commonsense ques...
research
04/28/2022

Inferring Implicit Relations with Language Models

A prominent challenge for modern language understanding systems is the a...
research
05/18/2020

Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks

Numerical reasoning is often important to accurately understand the worl...
research
05/02/2020

Measuring and Reducing Non-Multifact Reasoning in Multi-hop Question Answering

The measurement of true progress in multihop question-answering has been...
research
09/22/2020

SQuARE: Semantics-based Question Answering and Reasoning Engine

Understanding the meaning of a text is a fundamental challenge of natura...

Please sign up or login with your details

Forgot password? Click here to reset