Resetting the Optimizer in Deep RL: An Empirical Study

06/30/2023
by   Kavosh Asadi, et al.
0

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.

READ FULL TEXT

page 9

page 16

page 17

page 18

page 19

page 20

page 31

page 32

research
12/10/2021

Deep Q-Network with Proximal Iteration

We employ Proximal Iteration for value-function optimization in reinforc...
research
10/05/2018

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have sh...
research
02/06/2022

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...
research
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...
research
06/03/2021

Iterative Empirical Game Solving via Single Policy Best Response

Policy-Space Response Oracles (PSRO) is a general algorithmic framework ...
research
07/19/2019

Surfing: Iterative optimization over incrementally trained deep networks

We investigate a sequential optimization procedure to minimize the empir...
research
01/15/2023

Computability of Optimizers

Optimization problems are a staple of today's scientific and technical l...

Please sign up or login with your details

Forgot password? Click here to reset