Posterior Sampling for Deep Reinforcement Learning

04/30/2023
by   Remo Sasso, et al.
3

Despite remarkable successes, deep reinforcement learning algorithms remain sample inefficient: they require an enormous amount of trial and error to find good policies. Model-based algorithms promise sample efficiency by building an environment model that can be used for planning. Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence. PSDRL combines efficient uncertainty quantification over latent state space models with a specially tailored continual planning algorithm based on value-function approximation. Extensive experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling while being competitive with a state-of-the-art (model-based) reinforcement learning method, both in sample efficiency and computational efficiency.

READ FULL TEXT

page 13

page 19

page 20

research
10/23/2020

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Sample efficiency has been one of the major challenges for deep reinforc...
research
02/24/2023

Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulat...
research
07/24/2021

Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?

We contribute to micro-data model-based reinforcement learning (MBRL) by...
research
02/12/2018

Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation

Modern reinforcement learning algorithms reach super-human performance i...
research
11/16/2022

Model Based Residual Policy Learning with Applications to Antenna Control

Non-differentiable controllers and rule-based policies are widely used f...
research
11/25/2019

Biologically inspired architectures for sample-efficient deep reinforcement learning

Deep reinforcement learning requires a heavy price in terms of sample ef...
research
09/08/2022

An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning

We study a posterior sampling approach to efficient exploration in const...

Please sign up or login with your details

Forgot password? Click here to reset