Efficient Exploration through Bayesian Deep Q-Networks

02/13/2018
by   Kamyar Azizzadenesheli, et al.
0

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) model. This layer can be trained with fast closed-form updates and its samples can be drawn efficiently through the Gaussian distribution. We apply our method to a wide range of Atari games in Arcade Learning Environments. Since BDQN carries out more efficient exploration, it is able to reach higher rewards substantially faster than a key baseline, the double deep Q network (DDQN).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2018

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling

Recent advances in deep reinforcement learning have made significant str...
research
08/04/2020

Exploring Variational Deep Q Networks

This study provides both analysis and a refined, research-ready implemen...
research
05/13/2019

Distributional Reinforcement Learning for Efficient Exploration

In distributional reinforcement learning (RL), the estimated distributio...
research
07/03/2015

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

Achieving efficient and scalable exploration in complex domains poses a ...
research
10/06/2021

Residual Overfit Method of Exploration

Exploration is a crucial aspect of bandit and reinforcement learning alg...
research
04/16/2021

Uncertainty Surrogates for Deep Learning

In this paper we introduce a novel way of estimating prediction uncertai...
research
01/28/2023

STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

Directed Exploration is a crucial challenge in reinforcement learning (R...

Please sign up or login with your details

Forgot password? Click here to reset