(More) Efficient Reinforcement Learning via Posterior Sampling

06/04/2013
by   Ian Osband, et al.
0

Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an Õ(τ S √(AT)) bound on the expected regret, where T is time, τ is the episode length and S and A are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2016

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

Computational results demonstrate that posterior sampling for reinforcem...
research
09/08/2022

An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

We study a posterior sampling approach to efficient exploration in const...
research
11/29/2022

Posterior Sampling for Continuing Environments

We develop an extension of posterior sampling for reinforcement learning...
research
02/05/2018

Coordinated Exploration in Concurrent Reinforcement Learning

We consider a team of reinforcement learning agents that concurrently le...
research
11/15/2021

Delayed Feedback in Episodic Reinforcement Learning

There are many provably efficient algorithms for episodic reinforcement ...
research
05/07/2020

Reinforcement Learning with Feedback Graphs

We study episodic reinforcement learning in Markov decision processes wh...
research
06/30/2020

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...

Please sign up or login with your details

Forgot password? Click here to reset