Inverse POMDP: Inferring What You Think from What You Do

05/24/2018
by   Zhengwei Wu, et al.
0

Complex behaviors are often driven by an internal model, which integrates sensory information over time and facilitates long-term planning. Inferring the internal model is a crucial ingredient for interpreting neural activities of agents and is beneficial for imitation learning. Here we describe a method to infer an agent's internal model and dynamic beliefs, and apply it to a simulated agent performing a foraging task. We assume the agent behaves rationally according to their understanding of the task and the relevant causal variables that cannot be fully observed. We model this rational solution as a Partially Observable Markov Decision Process (POMDP). However, we allow that the agent may have wrong assumptions about the task, and our method learns these assumptions from the agent's actions.Given the agent's sensory observations and actions, we learn its internal model by maximum likelihood estimation over a set of task-relevant parameters. The Markov property of the POMDP enables us to characterize the transition probabilities between internal states and iteratively estimate the agent's policy using a constrained Expectation-Maximization algorithm. We validate our method on simulated agents performing suboptimally on a foraging task, and successfully recover the agent's actual model.

READ FULL TEXT

page 6

page 8

research
08/13/2019

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

Continuous control and planning remains a major challenge in robotics an...
research
02/02/2019

Belief dynamics extraction

Animal behavior is not driven simply by its current observations, but is...
research
03/02/2020

Causal Transfer for Imitation Learning and Decision Making under Sensor-shift

Learning from demonstrations (LfD) is an efficient paradigm to train AI ...
research
06/03/2021

Individual vs. Joint Perception: a Pragmatic Model of Pointing as Communicative Smithian Helping

The simple gesture of pointing can greatly augment ones ability to compr...
research
06/27/2023

Learning non-Markovian Decision-Making from State-only Sequences

Conventional imitation learning assumes access to the actions of demonst...
research
05/14/2018

Maximizing Expected Impact in an Agent Reputation Network -- Technical Report

Many multi-agent systems (MASs) are situated in stochastic environments....
research
09/12/2023

Defining the Entropy and Internal Energy of a Monetary Schelling model through the Energy States of Individual Agents

This work investigates a modified Schelling model within the scope and a...

Please sign up or login with your details

Forgot password? Click here to reset