Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

05/27/2019
by   Aristide Tossou, et al.
0

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret Õ(√(DSAT)) up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

We present an algorithm based on the Optimism in the Face of Uncertainty...
research
06/20/2019

Near-optimal Reinforcement Learning using Bayesian Quantiles

We study model-based reinforcement learning in finite communicating Mark...
research
06/03/2019

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

In an effort to better understand the different ways in which the discou...
research
02/19/2021

Randomized Exploration is Near-Optimal for Tabular MDP

We study exploration using randomized value functions in Thompson Sampli...
research
04/20/2020

Tightening Exploration in Upper Confidence Reinforcement Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced ...
research
10/02/2020

Reinforcement Learning of Simple Indirect Mechanisms

We introduce the use of reinforcement learning for indirect mechanisms, ...
research
03/24/2023

Sequential Knockoffs for Variable Selection in Reinforcement Learning

In real-world applications of reinforcement learning, it is often challe...

Please sign up or login with your details

Forgot password? Click here to reset