Understanding Social Reasoning in Language Models with Language Models

06/21/2023
by   Kanishk Gandhi, et al.
0

As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of exploration. This is primarily due to two distinct challenges: (1) the presence of inconsistent results from previous evaluations, and (2) concerns surrounding the validity of existing evaluation methodologies. To address these challenges, we present a novel framework for procedurally generating evaluations with LLMs by populating causal templates. Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5,000 model-written evaluations. We find that human participants rate the quality of our benchmark higher than previous crowd-sourced evaluations and comparable to expert-written evaluations. Using BigToM, we evaluate the social reasoning capabilities of a variety of LLMs and compare model performances with human performance. Our results suggest that GPT4 has ToM capabilities that mirror human inference patterns, though less reliable, while other LLMs struggle.

READ FULL TEXT

page 2

page 7

page 13

page 18

page 22

research
08/10/2023

Metacognitive Prompting Improves Understanding in Large Language Models

In Large Language Models (LLMs), there have been consistent advancements...
research
07/09/2023

Shaping the Emerging Norms of Using Large Language Models in Social Computing Research

The emergence of Large Language Models (LLMs) has brought both excitemen...
research
12/26/2022

Large Language Models Encode Clinical Knowledge

Large language models (LLMs) have demonstrated impressive capabilities i...
research
02/04/2023

Theory of Mind May Have Spontaneously Emerged in Large Language Models

Theory of mind (ToM), or the ability to impute unobservable mental state...
research
07/25/2023

ARB: Advanced Reasoning Benchmark for Large Language Models

Large Language Models (LLMs) have demonstrated remarkable performance on...
research
09/12/2023

Re-Reading Improves Reasoning in Language Models

Reasoning presents a significant and challenging issue for Large Languag...
research
07/06/2023

Style Over Substance: Evaluation Biases for Large Language Models

As large language models (LLMs) continue to advance, accurately and comp...

Please sign up or login with your details

Forgot password? Click here to reset