Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks

02/16/2023
by   Tomer Ullman, et al.
0

Intuitive psychology is a pillar of common-sense reasoning. The replication of this reasoning in machine intelligence is an important stepping-stone on the way to human-like artificial intelligence. Several recent tasks and benchmarks for examining this reasoning in Large-Large Models have focused in particular on belief attribution in Theory-of-Mind tasks. These tasks have shown both successes and failures. We consider in particular a recent purported success case, and show that small variations that maintain the principles of ToM turn the results on their head. We argue that in general, the zero-hypothesis for model evaluation in intuitive psychology should be skeptical, and that outlying failure cases should outweigh average success rates. We also consider what possible future successes on Theory-of-Mind tasks by more powerful LLMs would mean for ToM tasks with people.

READ FULL TEXT

page 4

page 6

research
04/22/2023

Boosting Theory-of-Mind Performance in Large Language Models via Prompting

Large language models (LLMs) excel in many tasks in 2023, but they still...
research
05/23/2019

On modelling the emergence of logical thinking

Recent progress in machine learning techniques have revived interest in ...
research
03/20/2023

Mind meets machine: Unravelling GPT-4's cognitive psychology

Commonsense reasoning is a basic ingredient of intelligence in humans, e...
research
05/23/2023

Does ChatGPT have Theory of Mind?

“Theory of Mind" (ToM) is the ability to understand human thinking and d...
research
05/24/2023

ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind

Theory of Mind (ToM), the capacity to comprehend the mental states of di...
research
08/22/2013

David Poole's Specificity Revised

In the middle of the 1980s, David Poole introduced a semantical, model-t...
research
06/11/2023

Inductive reasoning in humans and large language models

The impressive recent performance of large language models has led many ...

Please sign up or login with your details

Forgot password? Click here to reset