ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

07/02/2020
by   Michael Gimelfarb, et al.
0

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of ε. In this paper, we provide a novel Bayesian perspective of ε as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt ε using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, ε-BMC, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent ε adaptation scheme proposed in the literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2016

Bayesian Reinforcement Learning: A Survey

Bayesian methods for machine learning have been widely investigated, yie...
research
10/11/2022

The Role of Exploration for Task Transfer in Reinforcement Learning

The exploration–exploitation trade-off in reinforcement learning (RL) is...
research
04/04/2018

Information Maximizing Exploration with a Latent Dynamics Model

All reinforcement learning algorithms must handle the trade-off between ...
research
06/01/2021

An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning

Policy-based reinforcement learning methods suffer from the policy colla...
research
03/13/2013

A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

Bayesian Reinforcement Learning (RL) is capable of not only incorporatin...
research
05/08/2013

Cover Tree Bayesian Reinforcement Learning

This paper proposes an online tree-based Bayesian approach for reinforce...
research
10/05/2020

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design

Efficient design of biological sequences will have a great impact across...

Please sign up or login with your details

Forgot password? Click here to reset