Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

09/16/2022
by   Litian Liang, et al.
0

In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble means. Despite its simplicity, MeanQ shows remarkable sample efficiency in experiments on the Atari Learning Environment benchmark. Importantly, we find that an ensemble of size 5 sufficiently reduces estimation variance to obviate the lagging target network, eliminating it as a source of bias and further gaining sample efficiency. We justify intuitively and empirically the design choices in MeanQ, including the necessity of independent experience sampling. On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68 21/26 environments, and by 49 performance using 200K (±100K) interaction steps. Our implementation is available at https://github.com/indylab/MeanQ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2018

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Policy gradient methods are a widely used class of model-free reinforcem...
research
07/05/2021

Ensemble and Auxiliary Tasks for Data-Efficient Deep Reinforcement Learning

Ensemble and auxiliary tasks are both well known to improve the performa...
research
06/15/2020

Inner Ensemble Nets

We introduce Inner Ensemble Networks (IENs) which reduce the variance wi...
research
06/29/2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

We propose a novel value approximation method, namely Eigensubspace Regu...
research
06/24/2020

Reducing Overestimation Bias by Increasing Representation Dissimilarity in Ensemble Based Deep Q-Learning

The first deep RL algorithm, DQN, was limited by the overestimation bias...
research
06/20/2023

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

The ensemble method is a promising way to mitigate the overestimation is...
research
02/16/2020

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the...

Please sign up or login with your details

Forgot password? Click here to reset