Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

07/20/2023
by   Rylan Schaeffer, et al.
0

Language models can be prompted to reason through problems in a manner that significantly improves performance. However, why such prompting improves performance is unclear. Recent work showed that using logically invalid Chain-of-Thought (CoT) prompting improves performance almost as much as logically valid CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically invalid reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.

READ FULL TEXT
research
06/06/2023

Certified Reasoning with Language Models

Language models often achieve higher accuracy when reasoning step-by-ste...
research
07/14/2022

Language models show human-like content effects on reasoning

Abstract reasoning is a key ability for an intelligent system. Large lan...
research
05/23/2023

Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement

Prompting methods such as Chain-of-Thought (CoT) have shed new light on ...
research
10/20/2022

Transcending Scaling Laws with 0.1

Scaling language models improves performance but comes with significant ...
research
06/30/2023

Stay on topic with Classifier-Free Guidance

Classifier-Free Guidance (CFG) has recently emerged in text-to-image gen...
research
10/27/2021

Training Verifiers to Solve Math Word Problems

State-of-the-art language models can match human performance on many tas...
research
05/23/2023

Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

We explore whether Large Language Models (LLMs) are capable of logical r...

Please sign up or login with your details

Forgot password? Click here to reset