Constrained Variational Policy Optimization for Safe Reinforcement Learning

01/28/2022
by   Zuxin Liu, et al.
11

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying to safety-critical applications. Primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees. This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy. We show that the safe RL problem can be decomposed to 1) a convex optimization phase with a non-parametric variational distribution and 2) a supervised learning phase. We show the unique advantages of constrained variational policy optimization by proving its optimality and policy improvement stability. A wide range of experiments on continuous robotic tasks show that the proposed method achieves significantly better performance in terms of constraint satisfaction and sample efficiency than primal-dual baselines.

READ FULL TEXT

page 7

page 8

page 21

page 22

research
06/29/2023

Probabilistic Constraint for Safety-Critical Reinforcement Learning

In this paper, we consider the problem of learning safe policies for pro...
research
05/23/2023

Constrained Proximal Policy Optimization

The problem of constrained reinforcement learning (CRL) holds significan...
research
11/20/2019

Safe Policies for Reinforcement Learning via Primal-Dual Methods

In this paper, we study the learning of safe policies in the setting of ...
research
06/19/2019

Safe and Near-Optimal Policy Learning for Model Predictive Control using Primal-Dual Neural Networks

In this paper, we propose a novel framework for approximating the explic...
research
05/21/2020

Novel Policy Seeking with Constrained Optimization

In this work, we address the problem of learning to seek novel policies ...
research
07/30/2020

Chance Constrained Policy Optimization for Process Control and Optimization

Chemical process optimization and control are affected by 1) plant-model...
research
10/02/2022

Policy Gradients for Probabilistic Constrained Reinforcement Learning

This paper considers the problem of learning safe policies in the contex...

Please sign up or login with your details

Forgot password? Click here to reset