Mixture of Step Returns in Bootstrapped DQN

07/16/2020
by   Po-Han Chiang, et al.
0

The concept of utilizing multi-step returns for updating value functions has been adopted in deep reinforcement learning (DRL) for a number of years. Updating value functions with different backup lengths provides advantages in different aspects, including bias and variance of value estimates, convergence speed, and exploration behavior of the agent. Conventional methods such as TD-lambda leverage these advantages by using a target value equivalent to an exponential average of different step returns. Nevertheless, integrating step returns into a single target sacrifices the diversity of the advantages offered by different step return targets. To address this issue, we propose Mixture Bootstrapped DQN (MB-DQN) built on top of bootstrapped DQN, and uses different backup lengths for different bootstrapped heads. MB-DQN enables heterogeneity of the target values that is unavailable in approaches relying only on a single target value. As a result, it is able to maintain the advantages offered by different backup lengths. In this paper, we first discuss the motivational insights through a simple maze environment. In order to validate the effectiveness of MB-DQN, we perform experiments on the Atari 2600 benchmark environments, and demonstrate the performance improvement of MB-DQN over a number of baseline methods. We further provide a set of ablation studies to examine the impacts of different design configurations of MB-DQN.

READ FULL TEXT

page 3

page 4

page 6

research
05/21/2017

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Reinforcement Learning (RL) can model complex behavior policies for goal...
research
10/10/2022

Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems

High variances in reinforcement learning have shown impeding successful ...
research
02/07/2023

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Cooperative multi-agent reinforcement learning (MARL) requires agents to...
research
06/24/2020

Deeply Equal-Weighted Subset Portfolios

The high sensitivity of optimized portfolios to estimation errors has pr...
research
10/07/2021

Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning

Popular off-policy deep reinforcement learning algorithms compensate for...
research
03/20/2020

Deep Reinforcement Learning with Weighted Q-Learning

Overestimation of the maximum action-value is a well-known problem that ...
research
07/26/2022

Offline Reinforcement Learning at Multiple Frequencies

Leveraging many sources of offline robot data requires grappling with th...

Please sign up or login with your details

Forgot password? Click here to reset