Variational Regret Bounds for Reinforcement Learning

05/14/2019
by   Pratik Gajane, et al.
0

We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where both the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Efficient Learning in Non-Stationary Linear Markov Decision Processes

We study episodic reinforcement learning in non-stationary linear (a.k.a...
research
07/09/2020

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
research
05/25/2018

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

We consider reinforcement learning in changing Markov Decision Processes...
research
04/01/2023

Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes

We consider the problem of learning in a non-stationary reinforcement le...
research
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...
research
02/02/2022

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

We present an approach for the quantification of the usefulness of trans...
research
02/13/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corru...

Please sign up or login with your details

Forgot password? Click here to reset