Fast Global Convergence of Policy Optimization for Constrained MDPs

10/31/2021
by   Tao Liu, et al.
0

We address the issue of safety in reinforcement learning. We pose the problem in a discounted infinite-horizon constrained Markov decision process framework. Existing results have shown that gradient-based methods are able to achieve an 𝒪(1/√(T)) global convergence rate both for the optimality gap and the constraint violation. We exhibit a natural policy gradient-based algorithm that has a faster convergence rate 𝒪(log(T)/T) for both the optimality gap and the constraint violation. When Slater's condition is satisfied and known a priori, zero constraint violation can be further guaranteed for a sufficiently large T while maintaining the same convergence rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2022

Policy-based Primal-Dual Methods for Convex Constrained Markov Decision Processes

We study convex Constrained Markov Decision Processes (CMDPs) in which t...
research
06/12/2022

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in...
research
02/16/2021

Improper Learning with Gradient-based Policy Optimization

We consider an improper reinforcement learning setting where the learner...
research
04/02/2021

Neurons learn slower than they think

Recent studies revealed complex convergence dynamics in gradient-based m...
research
09/30/2022

Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization

We analyze the convergence rate of the unregularized natural policy grad...
research
08/05/2021

Lyapunov Robust Constrained-MDPs: Soft-Constrained Robustly Stable Policy Optimization under Model Uncertainty

Safety and robustness are two desired properties for any reinforcement l...
research
01/31/2021

Fast Rates for the Regret of Offline Reinforcement Learning

We study the regret of reinforcement learning from offline data generate...

Please sign up or login with your details

Forgot password? Click here to reset