Learning Safety Constraints from Demonstrations with Unknown Rewards

05/25/2023
by   David Lindner, et al.
0

We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics. CoCoRL constructs a convex safe set based on demonstrations, which provably guarantees safety even for potentially sub-optimal (but safe) demonstrations. For near-optimal demonstrations, CoCoRL converges to the true safe set with no policy regret. We evaluate CoCoRL in tabular environments and a continuous driving simulation with multiple constraints. CoCoRL learns constraints that lead to safe driving behavior and that can be transferred to different tasks and environments. In contrast, alternative methods based on Inverse Reinforcement Learning (IRL) often exhibit poor performance and learn unsafe policies.

READ FULL TEXT

page 2

page 28

research
12/17/2018

Learning Constraints from Demonstrations

We extend the learning from demonstration paradigm by providing a method...
research
04/06/2023

Constraint Inference in Control Tasks from Expert Demonstrations via Inverse Optimization

Inferring unknown constraints is a challenging and crucial problem in ma...
research
07/26/2019

Learning Task Specifications from Demonstrations via the Principle of Maximum Causal Entropy

In many settings (e.g., robotics) demonstrations provide a natural way t...
research
09/29/2022

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

It is quite challenging to ensure the safety of reinforcement learning (...
research
11/09/2020

Uncertainty-Aware Constraint Learning for Adaptive Safe Motion Planning from Demonstrations

We present a method for learning to satisfy uncertain constraints from d...
research
06/09/2020

Constrained episodic reinforcement learning in concave-convex and knapsack settings

We propose an algorithm for tabular episodic reinforcement learning with...
research
08/09/2023

Bayesian Inverse Transition Learning for Offline Settings

Offline Reinforcement learning is commonly used for sequential decision-...

Please sign up or login with your details

Forgot password? Click here to reset