UCB Exploration via Q-Ensembles

06/05/2017
by   Richard Y. Chen, et al.
0

We show how an ensemble of Q^*-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the Q-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

READ FULL TEXT

page 8

page 12

research
12/18/2018

Information-Directed Exploration for Deep Reinforcement Learning

Efficient exploration remains a major challenge for reinforcement learni...
research
08/24/2021

Entropy-Aware Model Initialization for Effective Exploration in Deep Reinforcement Learning

Encouraging exploration is a critical issue in deep reinforcement learni...
research
10/21/2021

Anti-Concentrated Confidence Bonuses for Scalable Exploration

Intrinsic rewards play a central role in handling the exploration-exploi...
research
11/15/2018

Context-Dependent Upper-Confidence Bounds for Directed Exploration

Directed exploration strategies for reinforcement learning are critical ...
research
06/12/2020

Hypermodels for Exploration

We study the use of hypermodels to represent epistemic uncertainty and g...
research
10/12/2019

Efficient Inference and Exploration for Reinforcement Learning

Despite an ever growing literature on reinforcement learning algorithms ...
research
05/13/2021

Principled Exploration via Optimistic Bootstrapping and Backward Induction

One principled approach for provably efficient exploration is incorporat...

Please sign up or login with your details

Forgot password? Click here to reset