Learning Safe Policies with Expert Guidance

05/21/2018
by   Jessie Huang, et al.
0

We propose a framework for ensuring safe behavior of a reinforcement learning agent when the reward function may be difficult to specify. In order to do this, we rely on the existence of demonstrations from expert policies, and we provide a theoretical framework for the agent to optimize in the space of rewards consistent with its existing knowledge. We propose two methods to solve the resulting optimization: an exact ellipsoid-based method and a method in the spirit of the "follow-the-perturbed-leader" algorithm. Our experiments demonstrate the behavior of our algorithm in both discrete and continuous problems. The trained agent safely avoids states with potential negative effects while imitating the behavior of the expert in the other states.

READ FULL TEXT

page 8

page 16

research
07/15/2020

Inverse Reinforcement Learning from a Gradient-based Learner

Inverse Reinforcement Learning addresses the problem of inferring an exp...
research
06/17/2022

Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis

Each year, expert-level performance is attained in increasingly-complex ...
research
05/28/2021

Task-Guided Inverse Reinforcement Learning Under Partial Information

We study the problem of inverse reinforcement learning (IRL), where the ...
research
05/31/2023

ROSARL: Reward-Only Safe Reinforcement Learning

An important problem in reinforcement learning is designing agents that ...
research
06/02/2022

Learning Soft Constraints From Constrained Expert Demonstrations

Inverse reinforcement learning (IRL) methods assume that the expert data...
research
12/30/2022

Task-Guided IRL in POMDPs that Scales

In inverse reinforcement learning (IRL), a learning agent infers a rewar...
research
02/15/2022

Safe Reinforcement Learning by Imagining the Near Future

Safe reinforcement learning is a promising path toward applying reinforc...

Please sign up or login with your details

Forgot password? Click here to reset