Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

06/01/2023
by   Andreas Schlaginhaufen, et al.
0

Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. From a convex-analytic perspective, we extend prior results on reward identifiability and generalizability to both the constrained setting and a more general class of regularizations. In particular, we show that identifiability up to potential shaping (Cao et al., 2021) is a consequence of entropy regularization and may generally no longer hold for other regularizations or in the presence of safety constraints. We also show that to ensure generalizability to new transition laws and constraints, the true reward must be identified up to a constant. Additionally, we derive a finite sample guarantee for the suboptimality of the learned rewards, and validate our results in a gridworld environment.

READ FULL TEXT
research
01/26/2020

Constrained Upper Confidence Reinforcement Learning

Constrained Markov Decision Processes are a class of stochastic decision...
research
11/19/2020

Inverse Constrained Reinforcement Learning

Standard reinforcement learning (RL) algorithms train agents to maximize...
research
12/17/2021

Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

We consider the challenge of policy simplification and verification in t...
research
07/12/2021

A Simple Reward-free Approach to Constrained Reinforcement Learning

In constrained reinforcement learning (RL), a learning agent seeks to no...
research
08/26/2020

Constrained Markov Decision Processes via Backward Value Functions

Although Reinforcement Learning (RL) algorithms have found tremendous su...
research
05/26/2023

Policy Synthesis and Reinforcement Learning for Discounted LTL

The difficulty of manually specifying reward functions has led to an int...
research
11/19/2022

Evaluating the Perceived Safety of Urban City via Maximum Entropy Deep Inverse Reinforcement Learning

Inspired by expert evaluation policy for urban perception, we proposed a...

Please sign up or login with your details

Forgot password? Click here to reset