It Takes One to Tango but More Make Trouble? In-Context Training with Different Number of Demonstrations

03/14/2023
by   Jiuhai Chen, et al.
0

Large language models (LLMs) are capable to perform complex reasoning by in-context learning (ICL) when provided with a few input-output demonstrations (demos) and more powerful when intermediate reasoning steps ("chain of thoughts (CoT)") of the demos are given. Is it necessary to use multi-demo in ICL? In this paper, we study ICL using fewer demos for each test query on the tasks in <cit.>. Surprisingly, we do not observe significant degradation when using only one randomly chosen demo. To study this phenomenon, for each test query, we categorize demos into "correct demos" leading to the correct answer, and "wrong demos" resulting in wrong answers. Our analysis reveals an inherent bias in those widely studied datasets: most demos are correct for a majority of test queries, which explains the good performance of using one random demo. Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets. Furthermore, we observe a counterintuitive behavior of ICL using multi-demo, i.e., its accuracy degrades(improves) when given more correct(wrong) demos. This implies that ICL can be easily misguided by interference among demos and their spurious correlations. Our analyses highlight several fundamental challenges that need to be addressed in LLMs training, ICL, and benchmark design.

READ FULL TEXT

page 3

page 4

page 6

research
05/22/2023

Can ChatGPT Defend the Truth? Automatic Dialectical Evaluation Elicits LLMs' Deficiencies in Reasoning

We explore testing the reasoning ability of large language models (LLMs)...
research
03/28/2022

STaR: Bootstrapping Reasoning With Reasoning

Generating step-by-step "chain-of-thought" rationales improves language ...
research
05/07/2023

Unified Demonstration Retriever for In-Context Learning

In-context learning is a new learning paradigm where a language model co...
research
06/04/2023

Evaluating and Improving Tool-Augmented Computation-Intensive Math Reasoning

Chain-of-thought prompting (CoT) and tool augmentation have been validat...
research
04/09/2019

Explaining Wrong Queries Using Small Examples

For testing the correctness of SQL queries, e.g., evaluating student sub...
research
07/18/2023

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Modern language models can imitate complex patterns through few-shot lea...
research
08/20/2023

Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

Current literature, aiming to surpass the "Chain-of-Thought" approach, o...

Please sign up or login with your details

Forgot password? Click here to reset