Near-optimal Reinforcement Learning using Bayesian Quantiles

06/20/2019
by   Aristide Tossou, et al.
0

We study model-based reinforcement learning in finite communicating Markov Decision Process. Algorithms in this settings have been developed in two different ways: the first view, which typically provides frequentist performance guarantees, uses optimism in the face of uncertainty as the guiding algorithmic principle. The second view is based on Bayesian reasoning, combined with posterior sampling and Bayesian guarantees. In this paper, we develop a conceptually simple algorithm, Bayes-UCRL that combines the benefits of both approaches to achieve state-of-the-art performance for finite communicating MDP. In particular, we use Bayesian Prior similarly to Posterior Sampling. However, instead of sampling the MDP, we construct an optimistic MDP using the quantiles of the Bayesian prior. We show that this technique enjoys a high probability worst-case regret of order Õ(√(DSAT)). Experiments in a diverse set of environments show that our algorithms outperform previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

We study model-based reinforcement learning in an unknown finite communi...
research
06/20/2019

Near-optimal Bayesian Solution For Unknown Discrete Markov Decision Process

We tackle the problem of acting in an unknown finite and discrete Markov...
research
10/20/2022

Model-based Lifelong Reinforcement Learning with Bayesian Exploration

We propose a model-based lifelong reinforcement-learning approach that e...
research
09/28/2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

We consider reinforcement learning in an environment modeled by an episo...
research
02/19/2021

Randomized Exploration is Near-Optimal for Tabular MDP

We study exploration using randomized value functions in Thompson Sampli...
research
10/21/2022

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

One key challenge for multi-task Reinforcement learning (RL) in practice...
research
09/28/2019

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

In this paper we derive an efficient method for computing the indices as...

Please sign up or login with your details

Forgot password? Click here to reset