Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

05/23/2023
by   Shashank Sonkar, et al.
0

We explore whether Large Language Models (LLMs) are capable of logical reasoning with distorted facts, which we call Deduction under Perturbed Evidence (DUPE). DUPE presents a unique challenge to LLMs since they typically rely on their parameters, which encode mostly accurate information, to reason and make inferences. However, in DUPE, LLMs must reason over manipulated or falsified evidence present in their prompts, which can result in false conclusions that are valid only under the manipulated evidence. Our goal with DUPE is to determine whether LLMs can arrive at these false conclusions and identify whether the dominant factor influencing the deduction process is the encoded data in the parameters or the manipulated evidence in the prompts. To evaluate the DUPE capabilities of LLMs, we create a DUPEd version of the StrategyQA dataset, where facts are manipulated to reverse the answer to the question. Our findings show that even the most advanced GPT models struggle to reason on manipulated facts - showcasing poor DUPE skills - with accuracy dropping by 45 settings inspired from student simulation models, which mitigate the accuracy drop to some extent. Our findings have practical implications for understanding the performance of LLMs in real-world applications such as student simulation models that involve reasoning over inaccurate information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

To what extent can a neural network systematically reason over symbolic ...
research
06/18/2020

Pre-trained Language Models as Symbolic Reasoners over Knowledge?

How can pre-trained language models (PLMs) learn factual knowledge from ...
research
04/07/2020

Guessing What's Plausible But Remembering What's True: Accurate Neural Reasoning for Question-Answering

Neural approaches to natural language processing (NLP) often fail at the...
research
08/31/2023

Experimenting with ChatGPT for Spreadsheet Formula Generation: Evidence of Risk in AI Generated Spreadsheets

Large Language Models (LLM) have become sophisticated enough that comple...
research
07/20/2023

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Language models can be prompted to reason through problems in a manner t...
research
05/10/2023

RECKONING: Reasoning through Dynamic Knowledge Encoding

Recent studies on transformer-based language models show that they can a...
research
03/20/2013

A Reason Maintenace System Dealing with Vague Data

A reason maintenance system which extends an ATMS through Mukaidono's fu...

Please sign up or login with your details

Forgot password? Click here to reset