Investigating the Edge of Stability Phenomenon in Reinforcement Learning

07/09/2023
by   Rares Iordan, et al.
0

Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN – using a Huber loss – showing a strong edge of stability effect that we do not observe with C51 – using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2022

Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Recently, researchers observed that gradient descent for deep neural net...
research
02/26/2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

We empirically demonstrate that full-batch gradient descent on neural ne...
research
07/29/2022

Adaptive Gradient Methods at the Edge of Stability

Very little is known about the training dynamics of adaptive gradient me...
research
04/12/2019

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Reinforcement learning (RL) is about sequential decision making and is t...
research
07/09/2023

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Cohen et al. (2021) empirically study the evolution of the largest eigen...
research
06/07/2021

Correcting Momentum in Temporal Difference Learning

A common optimization tool used in deep reinforcement learning is moment...
research
10/27/2020

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

We identify an implicit under-parameterization phenomenon in value-based...

Please sign up or login with your details

Forgot password? Click here to reset