A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

05/25/2018
by   Pratik Gajane, et al.
0

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time. For this problem setting, we propose an algorithm using a sliding window approach and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. We also characterize the optimal window size suitable for our algorithm. These results are complemented by a sample complexity bound on the number of sub-optimal steps taken by the algorithm. Finally, we present some experimental results to support our theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2019

Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision proce...
research
06/24/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision...
research
10/18/2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

We study episodic reinforcement learning (RL) in non-stationary linear k...
research
05/15/2019

Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards

We consider an agent who is involved in a Markov decision process and re...
research
07/09/2020

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
research
03/06/2020

Active Model Estimation in Markov Decision Processes

We study the problem of efficient exploration in order to learn an accur...
research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...

Please sign up or login with your details

Forgot password? Click here to reset