DeepAI AI Chat
Log In Sign Up

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

10/05/2018
by   Peter Henderson, et al.
Stanford University
McGill University
12

Recent analyses of certain gradient descent optimization methods have shown that performance can degrade in some settings - such as with stochasticity or implicit momentum. In deep reinforcement learning (Deep RL), such optimization methods are often used for training neural networks via the temporal difference error or policy gradient. As an agent improves over time, the optimization target changes and thus the loss landscape (and local optima) change. Due to the failure modes of those methods, the ideal choice of optimizer for Deep RL remains unclear. As such, we provide an empirical analysis of the effects that a wide range of gradient descent optimizers and their hyperparameters have on policy gradient methods, a subset of Deep RL algorithms, for benchmark continuous control tasks. We find that adaptive optimizers have a narrow window of effective learning rates, diverging in other cases, and that the effectiveness of momentum varies depending on the properties of the environment. Our analysis suggests that there is significant interplay between the dynamics of the environment and Deep RL algorithm properties which aren't necessarily accounted for by traditional adaptive gradient methods. We provide suggestions for optimal settings of current methods and further lines of research based on our findings.

READ FULL TEXT

page 12

page 15

page 17

page 18

page 20

page 40

11/15/2019

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

Model-free reinforcement learning algorithms such as Deep Deterministic ...
02/12/2020

LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Identifying a divergence problem in Adam, we propose a new optimizer, La...
06/30/2023

Resetting the Optimizer in Deep RL: An Empirical Study

We focus on the task of approximating the optimal value function in deep...
06/07/2021

Correcting Momentum in Temporal Difference Learning

A common optimization tool used in deep reinforcement learning is moment...
08/05/2020

ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

Distribution-based search algorithms are an effective approach for evolu...
10/21/2019

Regularization Matters in Policy Optimization

Deep Reinforcement Learning (Deep RL) has been receiving increasingly mo...
10/11/2019

On Empirical Comparisons of Optimizers for Deep Learning

Selecting an optimizer is a central step in the contemporary deep learni...

Code Repositories

WhereDidMyOptimumGo

An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods - EWRL Workshop 2018


view repo

cool_stuff

A list of cool tools and libraries I like to keep around


view repo