Task-Guided Inverse Reinforcement Learning Under Partial Information

05/28/2021
by   Franck Djeumou, et al.
0

We study the problem of inverse reinforcement learning (IRL), where the learning agent recovers a reward function using expert demonstrations. Most of the existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs), where an agent cannot directly observe the current state of the POMDP. The algorithm addresses several limitations of existing techniques that do not take the information asymmetry between the expert and the agent into account. First, it adopts causal entropy as the measure of the likelihood of the expert demonstrations as opposed to entropy in most existing IRL techniques and avoids a common source of algorithmic complexity. Second, it incorporates task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations, and may reduce the information asymmetry between the expert and the agent. Nevertheless, the resulting formulation is still nonconvex due to the intrinsic nonconvexity of the so-called forward problem, i.e., computing an optimal policy given a reward function, in POMDPs. We address this nonconvexity through sequential convex programming and introduce several extensions to solve the forward problem in a scalable manner. This scalability allows computing policies that incorporate memory at the expense of added computational cost yet also achieves higher performance compared to memoryless policies. We demonstrate that, even with severely limited data, the algorithm learns reward functions and policies that satisfy the task and induce a similar behavior to the expert by leveraging the side information and incorporating memory into the policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/30/2022

Task-Guided IRL in POMDPs that Scales

In inverse reinforcement learning (IRL), a learning agent infers a rewar...
research
09/21/2020

Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent Policy Optimization

In this paper, we study the problem of obtaining a control policy that c...
research
10/11/2017

Specification Inference from Demonstrations

Learning from expert demonstrations has received a lot of attention in a...
research
09/22/2017

Inverse Reinforcement Learning with Conditional Choice Probabilities

We make an important connection to existing results in econometrics to d...
research
05/21/2018

Learning Safe Policies with Expert Guidance

We propose a framework for ensuring safe behavior of a reinforcement lea...
research
08/04/2020

Deep Inverse Q-learning with Constraints

Popular Maximum Entropy Inverse Reinforcement Learning approaches requir...
research
09/09/2019

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...

Please sign up or login with your details

Forgot password? Click here to reset