A Lyapunov-based Approach to Safe Reinforcement Learning

05/20/2018
by   Yinlam Chow, et al.
0

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance it is crucial to guarantee the safety of an agent during training as well as deployment (e.g. a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision problems (CMDPs), an extension of the standard Markov decision problems (MDPs) augmented with constraints on expected cumulative costs. Our approach hinges on a novel Lyapunov method. We define and present a method for constructing Lyapunov functions, which provide an effective way to guarantee the global safety of a behavior policy during training via a set of local, linear constraints. Leveraging these theoretical underpinnings, we show how to use the Lyapunov approach to systematically transform dynamic programming (DP) and RL algorithms into their safe counterparts. To illustrate their effectiveness, we evaluate these algorithms in several CMDP planning and decision-making tasks on a safety benchmark domain. Our results show that our proposed method significantly outperforms existing baselines in balancing constraint satisfaction and performance.

READ FULL TEXT

page 7

page 29

research
12/01/2021

Safe Exploration for Constrained Reinforcement Learning with Provable Guarantees

We consider the problem of learning an episodic safe control policy that...
research
07/29/2021

Lyapunov-based uncertainty-aware safe reinforcement learning

Reinforcement learning (RL) has shown a promising performance in learnin...
research
01/24/2023

AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning

Safety is a critical hurdle that limits the application of deep reinforc...
research
05/25/2023

C-MCTS: Safe Planning with Monte Carlo Tree Search

Many real-world decision-making tasks, such as safety-critical scenarios...
research
04/11/2023

Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning

Interpretability of AI models allows for user safety checks to build tru...
research
02/20/2020

From Stateless to Stateful Priorities: Technical Report

We present the notion of stateful priorities for imposing precise restri...
research
06/11/2021

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in re...

Please sign up or login with your details

Forgot password? Click here to reset