A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

11/11/2020
by   Tengyu Xu, et al.
0

Safe reinforcement learning (SRL) problems are typically modeled as constrained Markov Decision Process (CMDP), in which an agent explores the environment to maximize the expected total reward and meanwhile avoids violating certain constraints on a number of expected total costs. In general, such SRL problems have nonconvex objective functions subject to multiple nonconvex constraints, and hence are very challenging to solve, particularly to provide a globally optimal policy. Many popular SRL algorithms adopt a primal-dual structure which utilizes the updating of dual variables for satisfying the constraints. In contrast, we propose a primal approach, called constraint-rectified policy optimization (CRPO), which updates the policy alternatingly between objective improvement and constraint satisfaction. CRPO provides a primal-type algorithmic framework to solve SRL problems, where each policy update can take any variant of policy optimization step. To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an 𝒪(1/√(T)) convergence rate to the global optimal policy in the constrained policy set and an 𝒪(1/√(T)) error bound on constraint satisfaction. This is the first finite-time analysis of SRL algorithms with global optimality guarantee. Our empirical results demonstrate that CRPO can outperform the existing primal-dual baseline algorithms significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2018

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for re...
research
10/20/2021

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

The problem of constrained Markov decision process (CMDP) is investigate...
research
05/24/2022

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satis...
research
10/07/2020

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a rew...
research
04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...
research
09/13/2021

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to...
research
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...

Please sign up or login with your details

Forgot password? Click here to reset