Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

05/23/2023
by   Angelica Chen, et al.
5

Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning and propose two types of self-consistency that are particularly important for multi-step logic – hypothetical consistency (the ability for a model to predict what its output would be in a hypothetical other context) and compositional consistency (consistency of a model's outputs for a compositional task even when an intermediate step is replaced with the model's output for that step). We demonstrate that four sizes of the GPT-3 model exhibit poor consistency rates across both types of consistency on four different tasks (Wikipedia, DailyDialog, arithmetic, and GeoQuery).

READ FULL TEXT

page 3

page 9

research
03/21/2022

Self-Consistency Improves Chain of Thought Reasoning in Language Models

We explore a simple ensemble strategy, self-consistency, that significan...
research
05/19/2023

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs

A popular approach for improving the correctness of output from large la...
research
05/24/2023

Discriminator-Guided Multi-step Reasoning with Language Models

In the context of multi-step reasoning, language models (LMs) probabilit...
research
05/01/2023

Learning to Reason and Memorize with Self-Notes

Large language models have been shown to struggle with limited context m...
research
04/28/2023

Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning

Languages models have been successfully applied to a variety of reasonin...
research
06/22/2023

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

The task of empowering large language models (LLMs) to accurately expres...
research
05/23/2023

Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Debiasing methods that seek to mitigate the tendency of Language Models ...

Please sign up or login with your details

Forgot password? Click here to reset