Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

03/10/2023
by   Honghao Wei, et al.
0

We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost). In the non-stationary environment, reward, utility functions, and transition kernels can vary arbitrarily over time as long as the cumulative variations do not exceed certain variation budgets. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings with provable performance guarantees. Our results on regret bound and constraint violation for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget. We apply the approach for both tabular and linear approximation settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/19/2022

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design

We study risk-sensitive reinforcement learning (RL) based on an entropic...
research
10/07/2020

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

We consider model-free reinforcement learning (RL) in non-stationary Mar...
research
06/03/2021

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes

This paper presents the first model-free, simulator-free reinforcement l...
research
06/23/2022

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an age...
research
06/30/2020

Dynamic Regret of Policy Optimization in Non-stationary Environments

We consider reinforcement learning (RL) in episodic MDPs with adversaria...
research
11/13/2020

Non-stationary Online Regression

Online forecasting under a changing environment has been a problem of in...
research
08/09/2017

Non-stationary Stochastic Optimization with Local Spatial and Temporal Changes

We consider a non-stationary sequential stochastic optimization problem,...

Please sign up or login with your details

Forgot password? Click here to reset