Uniqueness and Complexity of Inverse MDP Models

06/02/2022
by   Marcus Hutter, et al.
1

What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural question, analogous to the forward case is to which extent 1-step inverse model p(a|ss') plus policy π(a|s) determine the multi-step inverse models or even the whole dynamics. In other words, can forward models be inferred from inverse models or even be side-stepped. This work addresses this question and variations thereof, and also whether there are efficient decision/inference algorithms for this.

READ FULL TEXT

page 12

page 31

research
04/13/2016

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Inverse Reinforcement Learning (IRL) describes the problem of learning a...
research
06/17/2023

FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning – A Physics-Constrained Approach to Markov Decision Processes

Inverse Reinforcement Learning (IRL) is a compelling technique for revea...
research
10/30/2019

Policy Continuation with Hindsight Inverse Dynamics

Solving goal-oriented tasks is an important but challenging problem in r...
research
07/02/2020

Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

We study the inverse reinforcement learning (IRL) problem under the tran...
research
09/16/2022

Learning Policies for Continuous Control via Transition Models

It is doubtful that animals have perfect inverse models of their limbs (...
research
08/17/2020

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which...
research
06/23/2016

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

We investigate an experiential learning paradigm for acquiring an intern...

Please sign up or login with your details

Forgot password? Click here to reset