Reinforcement Learning under Drift

06/07/2019
by   Wang Chi Cheung, et al.
0

We propose algorithms with state-of-the-art dynamic regret bounds for un-discounted reinforcement learning under drifting non-stationarity, where both the reward functions and state transition distributions are allowed to evolve over time. Our main contributions are: 1) A tuned Sliding Window Upper-Confidence bound for Reinforcement Learning with Confidence-Widening (SWUCRL2-CW) algorithm, which attains low dynamic regret bounds against the optimal non-stationary policy in various cases. 2) The Bandit-over-Reinforcement Learning (BORL) framework that further permits us to enjoy these dynamic regret bounds in a parameter-free manner.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision...
research
05/05/2013

Regret Bounds for Reinforcement Learning with Policy Advice

In some reinforcement learning problems an agent may be provided with a ...
research
03/04/2019

Hedging the Drift: Learning to Optimize under Non-Stationarity

We introduce general data-driven decision-making algorithms that achieve...
research
09/09/2019

Recommendation System-based Upper Confidence Bound for Online Advertising

In this paper, the method UCB-RS, which resorts to recommendation system...
research
02/23/2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

We study the non-stationary stochastic multiarmed bandit (MAB) problem a...
research
02/13/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corru...
research
03/16/2023

Online Reinforcement Learning in Periodic MDP

We study learning in periodic Markov Decision Process (MDP), a special t...

Please sign up or login with your details

Forgot password? Click here to reset