Boosting Theory-of-Mind Performance in Large Language Models via Prompting

04/22/2023
by   Shima Rahimi Moghaddam, et al.
0

Large language models (LLMs) excel in many tasks in 2023, but they still face challenges in complex reasoning. Theory-of-mind (ToM) tasks, which require understanding agents' beliefs, goals, and mental states, are essential for common-sense reasoning involving humans, making it crucial to enhance LLM performance in this area. This study measures the ToM performance of GPT-4 and three GPT-3.5 variants (Davinci-2, Davinci-3, GPT-3.5-Turbo), and investigates the effectiveness of in-context learning in improving their ToM comprehension. We evaluated prompts featuring two-shot chain of thought reasoning and step-by-step thinking instructions. We found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80 short of the 87 prompts for in-context learning, all RLHF-trained LLMs exceeded 80 accuracy, with GPT-4 reaching 100 prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/25/2023

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Language models still struggle on moral reasoning, despite their impress...
research
02/16/2023

Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

Intuitive psychology is a pillar of common-sense reasoning. The replicat...
research
01/27/2012

The thermodynamic cost of fast thought

After more than sixty years, Shannon's research [1-3] continues to raise...
research
09/16/2023

EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning

Large language models primarily rely on incontext learning to execute ta...
research
07/24/2023

Interpretable Stereotype Identification through Reasoning

Given that language models are trained on vast datasets that may contain...
research
06/13/2023

Synapse: Leveraging Few-Shot Exemplars for Human-Level Computer Control

This paper investigates the design of few-shot exemplars for computer au...
research
10/04/2022

ThinkSum: Probabilistic reasoning over sets using large language models

Large language models (LLMs) have a substantial capacity for high-level ...

Please sign up or login with your details

Forgot password? Click here to reset