Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback

06/20/2023
by   Hang Wang, et al.
0

The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2021

Fourier Series-Based Approximation of Time-Varying Parameters Using the Ensemble Kalman Filter

In this work, we propose a Fourier series-based approximation method usi...
research
11/07/2021

Uncertainty Calibration for Ensemble-Based Debiasing Methods

Ensemble-based debiasing methods have been shown effective in mitigating...
research
06/25/2019

Time-Varying Interaction Estimation Using Ensemble Methods

Directed information (DI) is a useful tool to explore time-directed inte...
research
02/16/2020

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the...
research
09/16/2022

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

In temporal-difference reinforcement learning algorithms, variance in va...
research
12/23/2020

BENN: Bias Estimation Using Deep Neural Network

The need to detect bias in machine learning (ML) models has led to the d...
research
08/18/2020

Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet

We propose a new method to compute the feedback capacity of unifilar fin...

Please sign up or login with your details

Forgot password? Click here to reset