Safe MDP Planning by Learning Temporal Patterns of Undesirable Trajectories and Averting Negative Side Effects

04/06/2023
by   Siow Meng Low, et al.
5

In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects. In the real world, often the state representation used may lack sufficient fidelity to specify such safety constraints. Operating based on an incomplete model can often produce unintended negative side effects (NSEs). To address these challenges, first, we associate safety signals with state-action trajectories (rather than just an immediate state-action). This makes our safety model highly general. We also assume categorical safety labels are given for different trajectories, rather than a numerical cost function, which is harder to specify by the problem designer. We then employ a supervised learning model to learn such non-Markovian safety patterns. Second, we develop a Lagrange multiplier method, which incorporates the safety model and the underlying MDP model in a single computation graph to facilitate agent learning of safe behaviors. Finally, our empirical results on a variety of discrete and continuous domains show that this approach can satisfy complex non-Markovian safety constraints while optimizing an agent's total returns, is highly scalable, and is also better than the previous best approach for Markovian NSEs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2019

Reconnaissance and Planning algorithm for constrained MDP

Practical reinforcement learning problems are often formulated as constr...
research
08/15/2020

Safe Reinforcement Learning in Constrained Markov Decision Processes

Safe reinforcement learning has been a promising approach for optimizing...
research
01/02/2022

Reinforcement Learning for Task Specifications with Action-Constraints

In this paper, we use concepts from supervisory control theory of discre...
research
10/19/2022

Provably Safe Reinforcement Learning via Action Projection using Reachability Analysis and Polynomial Zonotopes

While reinforcement learning produces very promising results for many ap...
research
12/04/2022

Automata Learning meets Shielding

Safety is still one of the major research challenges in reinforcement le...
research
01/05/2017

Learning local trajectories for high precision robotic tasks : application to KUKA LBR iiwa Cartesian positioning

To ease the development of robot learning in industry, two conditions ne...
research
07/10/2020

AGI Agent Safety by Iteratively Improving the Utility Function

While it is still unclear if agents with Artificial General Intelligence...

Please sign up or login with your details

Forgot password? Click here to reset