CRASS: A Novel Data Set and Benchmark to Test Counterfactual Reasoning of Large Language Models

12/22/2021
by   Jörg Frohberg, et al.
0

We introduce the CRASS (counterfactual reasoning assessment) data set and benchmark utilizing questionized counterfactual conditionals as a novel and powerful tool to evaluate large language models. We present the data set design and benchmark as well as the accompanying API that supports scoring against a crowd-validated human baseline. We test six state-of-the-art models against our benchmark. Our results show that it poses a valid challenge for these models and opens up considerable room for their improvement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Counterfactual reasoning: Testing language models' understanding of hypothetical scenarios

Current pre-trained language models have enabled remarkable improvements...
research
12/06/2022

Counterfactual reasoning: Do language models need world knowledge for causal understanding?

Current pre-trained language models have enabled remarkable improvements...
research
09/09/2019

Counterfactual Story Reasoning and Generation

Counterfactual reasoning requires predicting how alternative events, con...
research
08/02/2020

SemEval-2020 Task 5: Counterfactual Recognition

We present a counterfactual recognition (CR) task, the shared Task 5 of ...
research
05/30/2023

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

We propose an automated algorithm to stress-test a trained visual model ...
research
05/04/2023

"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process

As the popularity of large language models (LLMs) soars across various a...
research
11/01/2020

Comprehensible Counterfactual Interpretation on Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov (KS) test is popularly used in many applications,...

Please sign up or login with your details

Forgot password? Click here to reset