Deep Reinforcement Learning and the Deadly Triad

12/06/2018
by   Hado van Hasselt, et al.
0

We know from reinforcement learning theory that temporal difference learning can fail in certain cases. Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can diverge with the value estimates becoming unbounded. However, several algorithms successfully combine these three properties, which indicates that there is at least a partial gap in our understanding. In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

Emphatic Algorithms for Deep Reinforcement Learning

Off-policy learning allows us to learn about possible policies of behavi...
research
07/14/2017

Lenient Multi-Agent Deep Reinforcement Learning

A significant amount of research in recent years has been dedicated towa...
research
05/04/2021

On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning

The lottery ticket hypothesis questions the role of overparameterization...
research
01/09/2021

Deep Reinforcement Learning with Function Properties in Mean Reversion Strategies

With the recent advancement in Deep Reinforcement Learning in the gaming...
research
08/16/2020

An adaptive synchronization approach for weights of deep reinforcement learning

Deep Q-Networks (DQN) is one of the most well-known methods of deep rein...
research
11/15/2021

Piano Fingering with Reinforcement Learning

Hand and finger movements are a mainstay of piano technique. Automatic F...
research
10/14/2022

A Scalable Finite Difference Method for Deep Reinforcement Learning

Several low-bandwidth distributable black-box optimization algorithms ha...

Please sign up or login with your details

Forgot password? Click here to reset