Correcting Momentum in Temporal Difference Learning

06/07/2021
by   Emmanuel Bengio, et al.
26

A common optimization tool used in deep reinforcement learning is momentum, which consists in accumulating and discounting past gradients, reapplying them at each iteration. We argue that, unlike in supervised learning, momentum in Temporal Difference (TD) learning accumulates gradients that become doubly stale: not only does the gradient of the loss change due to parameter updates, the loss itself changes due to bootstrapping. We first show that this phenomenon exists, and then propose a first-order correction term to momentum. We show that this correction term improves sample efficiency in policy evaluation by correcting target value drift. An important insight of this work is that deep RL methods are not always best served by directly importing techniques from the supervised setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2019

Momentum in Reinforcement Learning

We adapt the optimization's concept of momentum to reinforcement learnin...
research
11/22/2021

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...
research
10/05/2018

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have sh...
research
03/25/2022

Improving Adversarial Transferability with Spatial Momentum

Deep Neural Networks (DNN) are vulnerable to adversarial examples. Altho...
research
07/09/2023

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

Recent progress has been made in understanding optimisation dynamics in ...
research
03/01/2021

UCB Momentum Q-learning: Correcting the bias without forgetting

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algo...
research
06/25/2020

MTAdam: Automatic Balancing of Multiple Training Loss Terms

When training neural models, it is common to combine multiple loss terms...

Please sign up or login with your details

Forgot password? Click here to reset