Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

01/28/2022
by   Yuhao Ding, et al.
0

We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which play a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we propose a Periodically Restarted Optimistic Primal-Dual Proximal Policy Optimization (PROPD-PPO) algorithm that features three mechanisms: periodic-restart-based policy improvement, dual update with dual regularization, and periodic-restart-based optimistic policy evaluation. We establish a dynamic regret bound and a constraint violation bound for the proposed algorithm in both the linear kernel CMDP function approximation setting and the tabular CMDP setting. This paper provides the first provably efficient algorithm for non-stationary CMDPs with safe exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

We study episodic reinforcement learning (RL) in non-stationary linear k...
research
10/29/2019

Constrained Reinforcement Learning Has Zero Duality Gap

Autonomous agents must often deal with conflicting requirements, such as...
research
03/30/2022

Factored Adaptation for Non-Stationary Reinforcement Learning

Dealing with non-stationarity in environments (i.e., transition dynamics...
research
01/27/2023

Safe Posterior Sampling for Constrained MDPs with Bounded Constraint Violation

Constrained Markov decision processes (CMDPs) model scenarios of sequent...
research
03/01/2020

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

We study the Safe Reinforcement Learning (SRL) problem using the Constra...
research
02/04/2023

Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments

We study online Reinforcement Learning (RL) in non-stationary input-driv...
research
06/04/2021

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

We address the issue of safety in reinforcement learning. We pose the pr...

Please sign up or login with your details

Forgot password? Click here to reset