Deep Interpretable Models of Theory of Mind For Human-Agent Teaming

by   Ini Oguntola, et al.

When developing AI systems that interact with humans, it is essential to design both a system that can understand humans, and a system that humans can understand. Most deep network based agent-modeling approaches are 1) not interpretable and 2) only model external behavior, ignoring internal mental states, which potentially limits their capability for assistance, interventions, discovering false beliefs, etc. To this end, we develop an interpretable modular neural framework for modeling the intentions of other observed entities. We demonstrate the efficacy of our approach with experiments on data from human participants on a search and rescue task in Minecraft, and show that incorporating interpretability can significantly increase predictive performance under the right conditions.


Machine Theory of Mind

Theory of mind (ToM; Premack & Woodruff, 1978) broadly refers to humans'...

Mathematical Models of Theory of Mind

Socially assistive robots provide physical and mental assistance for hum...

Towards Cognitive-and-Immersive Systems: Experiments in a Shared (or common) Blockworld Framework

As computational power has continued to increase, and sensors have becom...

Do Large Language Models know what humans know?

Humans can attribute mental states to others, a capacity known as Theory...

It Takes Two to Tango: Towards Theory of AI's Mind

Theory of Mind is the ability to attribute mental states (beliefs, inten...

Theory of Machine Networks: A Case Study

We propose a simplification of the Theory-of-Mind Network architecture, ...

A Unifying Bayesian Formulation of Measures of Interpretability in Human-AI

Existing approaches for generating human-aware agent behaviors have cons...