Non-stationary Reinforcement Learning under General Function Approximation

06/01/2023
by   Songtao Feng, et al.
0

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2020

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

We consider model-free reinforcement learning (RL) in non-stationary Mar...
research
10/23/2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Many real-world sequential decision-making problems involve critical sys...
research
06/24/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision...
research
10/18/2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

We study episodic reinforcement learning (RL) in non-stationary linear k...
research
06/30/2020

Dynamic Regret of Policy Optimization in Non-stationary Environments

We consider reinforcement learning (RL) in episodic MDPs with adversaria...
research
03/28/2022

Composite Anderson acceleration method with dynamic window-sizes and optimized damping

In this paper, we propose and analyze a set of fully non-stationary Ande...
research
02/10/2021

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

We propose a black-box reduction that turns a certain reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset