The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning

06/23/2020
by   Lingheng Meng, et al.
0

Multi-step (also called n-step) methods in reinforcement learning (RL) have been shown to be more efficient than the 1-step method due to faster propagation of the reward signal, both theoretically and empirically, in tasks exploiting tabular representation of the value-function. Recently, research in Deep Reinforcement Learning (DRL) also shows that multi-step methods improve learning speed and final performance in applications where the value-function and policy are represented with deep neural networks. However, there is a lack of understanding about what is actually contributing to the boost of performance. In this work, we analyze the effect of multi-step methods on alleviating the overestimation problem in DRL, where multi-step experiences are sampled from a replay buffer. Specifically building on top of Deep Deterministic Policy Gradient (DDPG), we propose Multi-step DDPG (MDDPG), where different step sizes are manually set, and its variant called Mixed Multi-step DDPG (MMDDPG) where an average over different multi-step backups is used as update target of Q-value function. Empirically, we show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than DDPG with 1-step backup, which consequently results in better final performance and learning speed. We also discuss the advantages and disadvantages of different ways to do multi-step expansion in order to reduce approximation error, and expose the tradeoff between overestimation and underestimation that underlies offline multi-step methods. Finally, we compare the computational resource needs of Twin Delayed Deep Deterministic Policy Gradient (TD3), a state-of-art algorithm proposed to address overestimation in actor-critic methods, and our proposed methods, since they show comparable final performance and learning speed.

READ FULL TEXT
research
04/15/2017

The Reactor: A Sample-Efficient Actor-Critic Architecture

In this work we present a new reinforcement learning agent, called React...
research
10/10/2022

Long N-step Surrogate Stage Reward to Reduce Variances of Deep Reinforcement Learning in Complex Problems

High variances in reinforcement learning have shown impeding successful ...
research
02/23/2021

Greedy Multi-step Off-Policy Reinforcement Learning

Multi-step off-policy reinforcement learning has achieved great success....
research
01/22/2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become ...
research
10/28/2021

Cooperative Deep Q-learning Framework for Environments Providing Image Feedback

In this paper, we address two key challenges in deep reinforcement learn...
research
05/31/2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

The successes of deep Reinforcement Learning (RL) are limited to setting...
research
04/25/2020

A State Aggregation Approach for Solving Knapsack Problem with Deep Reinforcement Learning

This paper proposes a Deep Reinforcement Learning (DRL) approach for sol...

Please sign up or login with your details

Forgot password? Click here to reset