Emphatic Algorithms for Deep Reinforcement Learning

06/21/2021
by   Ray Jiang, et al.
0

Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation and off-policy sampling - this is known as the ”deadly triad”. Emphatic temporal difference (ETD(λ)) algorithm ensures convergence in the linear case by appropriately weighting the TD(λ) updates. In this paper, we extend the use of emphatic methods to deep reinforcement learning agents. We show that naively adapting ETD(λ) to popular deep reinforcement learning algorithms, which use forward view multi-step returns, results in poor performance. We then derive new emphatic algorithms for use in the context of such algorithms, and we demonstrate that they provide noticeable benefits in small problems designed to highlight the instability of TD methods. Finally, we observed improved performance when applying these algorithms at scale on classic Atari games from the Arcade Learning Environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the d...
research
12/06/2018

Deep Reinforcement Learning and the Deadly Triad

We know from reinforcement learning theory that temporal difference lear...
research
02/28/2016

Investigating practical linear temporal difference learning

Off-policy reinforcement learning has many applications including: learn...
research
12/11/2022

Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks

In order to avoid conventional controlling methods which created obstacl...
research
05/10/2019

Design of Artificial Intelligence Agents for Games using Deep Reinforcement Learning

In order perform a large variety of tasks and to achieve human-level per...
research
02/28/2020

On Catastrophic Interference in Atari 2600 Games

Model-free deep reinforcement learning algorithms are troubled with poor...
research
01/22/2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become ...

Please sign up or login with your details

Forgot password? Click here to reset