AGENT: A Benchmark for Core Psychological Reasoning

by   Tianmin Shu, et al.

For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraints. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning. Inspired by cognitive development studies on intuitive psychology, we present a benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT (Action, Goal, Efficiency, coNstraint, uTility), structured around four scenarios (goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs) that probe key concepts of core intuitive psychology. We validate AGENT with human-ratings, propose an evaluation protocol emphasizing generalization, and compare two strong baselines built on Bayesian inverse planning and a Theory of Mind neural network. Our results suggest that to pass the designed tests of core intuitive psychology at human levels, a model must acquire or have built-in representations of how agents plan, combining utility computations and core knowledge of objects and physics.



There are no comments yet.


page 4

page 5

page 8


Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others

To achieve human-like common sense about everyday life, machine learning...

Too many cooks: Coordinating multi-agent collaboration through inverse planning

Collaboration requires agents to coordinate their behavior on the fly, s...

Making Human-Like Trade-offs in Constrained Environments by Learning from Demonstrations

Many real-life scenarios require humans to make difficult trade-offs: do...

Phy-Q: A Benchmark for Physical Reasoning

Humans are well-versed in reasoning about the behaviors of physical obje...

Emergence of Pragmatics from Referential Game between Theory of Mind Agents

Pragmatics studies how context can contribute to language meanings [1]. ...

Towards Cognitive-and-Immersive Systems: Experiments in a Shared (or common) Blockworld Framework

As computational power has continued to increase, and sensors have becom...

Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Building a socially intelligent agent involves many challenges, one of w...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.