Inverse Reinforcement Learning With Constraint Recovery

05/14/2023
by   Nirjhar Das, et al.
0

In this work, we propose a novel inverse reinforcement learning (IRL) algorithm for constrained Markov decision process (CMDP) problems. In standard IRL problems, the inverse learner or agent seeks to recover the reward function of the MDP, given a set of trajectory demonstrations for the optimal policy. In this work, we seek to infer not only the reward functions of the CMDP, but also the constraints. Using the principle of maximum entropy, we show that the IRL with constraint recovery (IRL-CR) problem can be cast as a constrained non-convex optimization problem. We reduce it to an alternating constrained optimization problem whose sub-problems are convex. We use exponentiated gradient descent algorithm to solve it. Finally, we demonstrate the efficacy of our algorithm for the grid world environment.

READ FULL TEXT
research
09/12/2019

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

While most approaches to the problem of Inverse Reinforcement Learning (...
research
02/02/2023

A general Markov decision process formalism for action-state entropy-regularized reward maximization

Previous work has separately addressed different forms of action, state ...
research
08/04/2020

Deep Inverse Q-learning with Constraints

Popular Maximum Entropy Inverse Reinforcement Learning approaches requir...
research
05/21/2019

Stochastic Inverse Reinforcement Learning

Inverse reinforcement learning (IRL) is an ill-posed inverse problem sin...
research
03/22/2022

X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning

Inverse Reinforcement Learning (IRL) is a powerful way of learning from ...
research
01/24/2020

Active Task-Inference-Guided Deep Inverse Reinforcement Learning

In inverse reinforcement learning (IRL), given a Markov decision process...
research
07/31/2019

Inverse Reinforcement Learning with Multiple Ranked Experts

We consider the problem of learning to behave optimally in a Markov Deci...

Please sign up or login with your details

Forgot password? Click here to reset