Dynamic Regret of Policy Optimization in Non-stationary Environments

06/30/2020
by   Yingjie Fei, et al.
0

We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels. We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. Compared with the classical notion of static regret, dynamic regret is a stronger notion as it explicitly accounts for the non-stationarity of environments. The dynamic regret attained by the proposed algorithms interpolates between different regimes of non-stationarity, and moreover satisfies a notion of adaptive (near-)optimality, in the sense that it matches the (near-)optimal static regret under slow-changing environments. The dynamic regret bound features two components, one arising from exploration, which deals with the uncertainty of transition kernels, and the other arising from adaptation, which deals with non-stationary environments. Specifically, we show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

We consider model-free reinforcement learning (RL) in non-stationary Mar...
research
11/19/2022

Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, and Separation Design

We study risk-sensitive reinforcement learning (RL) based on an entropic...
research
06/01/2023

Non-stationary Reinforcement Learning under General Function Approximation

General function approximation is a powerful tool to handle large state ...
research
03/10/2023

Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

We study model-free reinforcement learning (RL) algorithms in episodic n...
research
03/04/2019

Hedging the Drift: Learning to Optimize under Non-Stationarity

We introduce general data-driven decision-making algorithms that achieve...
research
12/28/2017

Online Ensemble Multi-kernel Learning Adaptive to Non-stationary and Adversarial Environments

Kernel-based methods exhibit well-documented performance in various nonl...
research
05/10/2022

Universal Caching

In the learning literature, the performance of an online policy is commo...

Please sign up or login with your details

Forgot password? Click here to reset