Learning Soft Constraints From Constrained Expert Demonstrations

06/02/2022
by   Ashish Gaurav, et al.
0

Inverse reinforcement learning (IRL) methods assume that the expert data is generated by an agent optimizing some reward function. However, in many settings, the agent may optimize a reward function subject to some constraints, where the constraints induce behaviors that may be otherwise difficult to express with just a reward function. We consider the setting where the reward function is given, and the constraints are unknown, and propose a method that is able to recover these constraints satisfactorily from the expert data. While previous work has focused on recovering hard constraints, our method can recover cumulative soft constraints that the agent satisfies on average per episode. In IRL fashion, our method solves this problem by adjusting the constraint function iteratively through a constrained optimization procedure, until the agent behavior matches the expert behavior. Despite the simplicity of the formulation, our method is able to obtain good results. We demonstrate our approach on synthetic environments and real world highway driving data.

READ FULL TEXT

page 7

page 8

page 15

page 17

page 18

page 19

research
09/12/2019

Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

While most approaches to the problem of Inverse Reinforcement Learning (...
research
11/19/2020

Inverse Constrained Reinforcement Learning

Standard reinforcement learning (RL) algorithms train agents to maximize...
research
02/16/2020

First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

In reinforcement learning, an agent attempts to learn high-performing be...
research
06/21/2023

Inverse Constraint Learning and Generalization by Transferable Reward Decomposition

We present the problem of inverse constraint learning (ICL), which recov...
research
04/13/2020

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation

Fashion is a complex social phenomenon. People follow fashion styles fro...
research
05/21/2018

Learning Safe Policies with Expert Guidance

We propose a framework for ensuring safe behavior of a reinforcement lea...
research
11/01/2021

On the Expressivity of Markov Reward

Reward is the driving force for reinforcement-learning agents. This pape...

Please sign up or login with your details

Forgot password? Click here to reset