Ensemble Bootstrapping for Q-Learning

02/28/2021
by   Oren Peer, et al.
0

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance. Finally, We demonstrate the superior performance of a deep RL variant of EBQL over other deep QL algorithms for a suite of ATARI games.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2020

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the...
research
06/12/2020

Decorrelated Double Q-learning

Q-learning with value function approximation may have the poor performan...
research
09/29/2021

On the Estimation Bias in Double Q-Learning

Double Q-learning is a classical method for reducing overestimation bias...
research
05/03/2021

Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Marko...
research
01/20/2022

Two-Sample Testing in Reinforcement Learning

Value-based reinforcement-learning algorithms have shown strong performa...
research
03/22/2022

Action Candidate Driven Clipped Double Q-learning for Discrete and Continuous Action Tasks

Double Q-learning is a popular reinforcement learning algorithm in Marko...
research
06/17/2023

Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm

Reinforcement Learning has achieved tremendous success in the many Atari...

Please sign up or login with your details

Forgot password? Click here to reset