Learn With Imagination: Safe Set Guided State-wise Constrained Policy Optimization

08/25/2023
by   Weiye Zhao, et al.
0

Deep reinforcement learning (RL) excels in various control tasks, yet the absence of safety guarantees hampers its real-world applicability. In particular, explorations during learning usually results in safety violations, while the RL agent learns from those mistakes. On the other hand, safe control techniques ensure persistent safety satisfaction but demand strong priors on system dynamics, which is usually hard to obtain in practice. To address these problems, we present Safe Set Guided State-wise Constrained Policy Optimization (S-3PO), a pioneering algorithm generating state-wise safe optimal policies with zero training violations, i.e., learning without mistakes. S-3PO first employs a safety-oriented monitor with black-box dynamics to ensure safe exploration. It then enforces a unique cost for the RL agent to converge to optimal behaviors within safety constraints. S-3PO outperforms existing methods in high-dimensional robotics tasks, managing state-wise constraints with zero training violation. This innovation marks a significant stride towards real-world safe RL deployment.

READ FULL TEXT

page 6

page 7

page 28

page 29

page 30

page 31

page 32

page 33

research
06/21/2023

State-wise Constrained Policy Optimization

Reinforcement Learning (RL) algorithms have shown tremendous success in ...
research
10/03/2022

Probabilistic Safeguard for Reinforcement Learning Using Safety Index Guided Gaussian Process Models

Safety is one of the biggest concerns to applying reinforcement learning...
research
08/04/2021

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations

Training-time safety violations have been a major concern when we deploy...
research
04/14/2021

Safe Continuous Control with Constrained Model-Based Policy Optimization

The applicability of reinforcement learning (RL) algorithms in real-worl...
research
12/14/2022

Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning

Learning a risk-aware policy is essential but rather challenging in unst...
research
04/21/2023

Approximate Shielding of Atari Agents for Safe Exploration

Balancing exploration and conservatism in the constrained setting is an ...
research
10/27/2020

Learning to be Safe: Deep RL with a Safety Critic

Safety is an essential component for deploying reinforcement learning (R...

Please sign up or login with your details

Forgot password? Click here to reset