Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

11/25/2021
by   Haitong Ma, et al.
0

In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

READ FULL TEXT

page 1

page 7

page 8

page 10

research
11/15/2021

Joint Synthesis of Safety Certificate and Safe Control Policy using Constrained Reinforcement Learning

Safety is the major consideration in controlling complex dynamical syste...
research
03/07/2023

A Multiplicative Value Function for Safe and Efficient Reinforcement Learning

An emerging field of sequential decision problems is safe Reinforcement ...
research
01/28/2022

Towards Safe Reinforcement Learning with a Safety Editor Policy

We consider the safe reinforcement learning (RL) problem of maximizing u...
research
03/20/2020

Interpretable Multi Time-scale Constraints in Model-free Deep Reinforcement Learning for Autonomous Driving

In many real world applications, reinforcement learning agents have to o...
research
03/02/2021

Model-based Constrained Reinforcement Learning using Generalized Control Barrier Function

Model information can be used to predict future trajectories, so it has ...
research
08/26/2021

Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

Safety is essential for reinforcement learning (RL) applied in the real ...
research
05/12/2022

Contingency-constrained economic dispatch with safe reinforcement learning

Future power systems will rely heavily on micro grids with a high share ...

Please sign up or login with your details

Forgot password? Click here to reset