SAUTE RL: Almost Surely Safe Reinforcement Learning Using State Augmentation

02/14/2022
by   Aivar Sootla, et al.
7

Satisfying safety constraints almost surely (or with probability one) can be critical for deployment of Reinforcement Learning (RL) in real-life applications. For example, plane landing and take-off should ideally occur with probability one. We address the problem by introducing Safety Augmented (Saute) Markov Decision Processes (MDPs), where the safety constraints are eliminated by augmenting them into the state-space and reshaping the objective. We show that Saute MDP satisfies the Bellman equation and moves us closer to solving Safe RL with constraints satisfied almost surely. We argue that Saute MDP allows to view Safe RL problem from a different perspective enabling new features. For instance, our approach has a plug-and-play nature, i.e., any RL algorithm can be "sauteed". Additionally, state augmentation allows for policy generalization across safety constraints. We finally show that Saute RL algorithms can outperform their state-of-the-art counterparts when constraint satisfaction is of high importance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2022

Enhancing Safe Exploration Using Safety State Augmentation

Safe exploration is a challenging and important problem in model-free re...
research
06/16/2021

Safe Reinforcement Learning Using Advantage-Based Intervention

Many sequential decision problems involve finding a policy that maximize...
research
01/19/2021

Spatial Assembly: Generative Architecture With Reinforcement Learning, Self Play and Tree Search

With this work, we investigate the use of Reinforcement Learning (RL) fo...
research
06/12/2020

Safety-guaranteed Reinforcement Learning based on Multi-class Support Vector Machine

Several works have addressed the problem of incorporating constraints in...
research
10/26/2022

Provable Safe Reinforcement Learning with Binary Feedback

Safety is a crucial necessity in many applications of reinforcement lear...
research
12/04/2022

Online Shielding for Reinforcement Learning

Besides the recent impressive results on reinforcement learning (RL), sa...
research
01/07/2022

Mirror Learning: A Unifying Framework of Policy Optimisation

General policy improvement (GPI) and trust-region learning (TRL) are the...

Please sign up or login with your details

Forgot password? Click here to reset