DeepAI
Log In Sign Up

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

11/11/2020
by   Tengyu Xu, et al.
0

Safe reinforcement learning (SRL) problems are typically modeled as constrained Markov Decision Process (CMDP), in which an agent explores the environment to maximize the expected total reward and meanwhile avoids violating certain constraints on a number of expected total costs. In general, such SRL problems have nonconvex objective functions subject to multiple nonconvex constraints, and hence are very challenging to solve, particularly to provide a globally optimal policy. Many popular SRL algorithms adopt a primal-dual structure which utilizes the updating of dual variables for satisfying the constraints. In contrast, we propose a primal approach, called constraint-rectified policy optimization (CRPO), which updates the policy alternatingly between objective improvement and constraint satisfaction. CRPO provides a primal-type algorithmic framework to solve SRL problems, where each policy update can take any variant of policy optimization step. To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an 𝒪(1/√(T)) convergence rate to the global optimal policy in the constrained policy set and an 𝒪(1/√(T)) error bound on constraint satisfaction. This is the first finite-time analysis of SRL algorithms with global optimality guarantee. Our empirical results demonstrate that CRPO can outperform the existing primal-dual baseline algorithms significantly.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/19/2018

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Constrained Markov Decision Process (CMDP) is a natural framework for re...
10/20/2021

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

The problem of constrained Markov decision process (CMDP) is investigate...
05/24/2022

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satis...
10/07/2020

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a rew...
04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...
01/28/2019

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it i...
07/12/2022

A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization

Nonconvex constrained optimization problems can be used to model a numbe...