Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

05/24/2023
by   Xiang Ji, et al.
0

A crucial problem in reinforcement learning is learning the optimal policy. We study this in tabular infinite-horizon discounted Markov decision processes under the online setting. The existing algorithms either fail to achieve regret optimality or have to incur a high memory and computational cost. In addition, existing optimal algorithms all require a long burn-in time in order to achieve optimal sample efficiency, i.e., their optimality is not guaranteed unless sample size surpasses a high threshold. We address both open problems by introducing a model-free algorithm that employs variance reduction and a novel technique that switches the execution policy in a slow-yet-adaptive manner. This is the first regret-optimal model-free algorithm in the discounted setting, with the additional benefit of a low burn-in time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...
research
04/22/2023

Reinforcement Learning with an Abrupt Model Change

The problem of reinforcement learning is considered where the environmen...
research
10/09/2021

Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning

Achieving sample efficiency in online episodic reinforcement learning (R...
research
12/12/2016

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous ...
research
03/07/2021

The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

Some reinforcement learning methods suffer from high sample complexity c...
research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...
research
02/21/2017

Sample Efficient Policy Search for Optimal Stopping Domains

Optimal stopping problems consider the question of deciding when to stop...

Please sign up or login with your details

Forgot password? Click here to reset