ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution

08/05/2020
by   Nihat Engin Toklu, et al.
0

Distribution-based search algorithms are an effective approach for evolutionary reinforcement learning of neural network controllers. In these algorithms, gradients of the total reward with respect to the policy parameters are estimated using a population of solutions drawn from a search distribution, and then used for policy optimization with stochastic gradient ascent. A common choice in the community is to use the Adam optimization algorithm for obtaining an adaptive behavior during gradient ascent, due to its success in a variety of supervised learning settings. As an alternative to Adam, we propose to enhance classical momentum-based gradient ascent with two simple techniques: gradient normalization and update clipping. We argue that the resulting optimizer called ClipUp (short for "clipped updates") is a better choice for distribution-based policy evolution because its working principles are simple and easy to understand and its hyperparameters can be tuned more intuitively in practice. Moreover, it removes the need to re-tune hyperparameters if the reward scale changes. Experiments show that ClipUp is competitive with Adam despite its simplicity and is effective on challenging continuous control benchmarks, including the Humanoid control task based on the Bullet physics simulator.

READ FULL TEXT

page 13

page 14

research
01/18/2022

AdaTerm: Adaptive T-Distribution Estimated Robust Moments towards Noise-Robust Stochastic Gradient Optimizer

As the problems to be optimized with deep learning become more practical...
research
05/27/2019

Policy Search by Target Distribution Learning for Continuous Control

We observe that several existing policy gradient methods (such as vanill...
research
10/05/2018

Where Did My Optimum Go?: An Empirical Analysis of Gradient Descent Optimization in Policy Gradient Methods

Recent analyses of certain gradient descent optimization methods have sh...
research
11/30/2021

Adaptive Optimization with Examplewise Gradients

We propose a new, more general approach to the design of stochastic grad...
research
12/20/2022

Normalized Stochastic Gradient Descent Training of Deep Neural Networks

In this paper, we introduce a novel optimization algorithm for machine l...
research
07/15/2020

Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons

Gradient-based hyperparameter optimization is an attractive way to perfo...
research
07/24/2023

An Isometric Stochastic Optimizer

The Adam optimizer is the standard choice in deep learning applications....

Please sign up or login with your details

Forgot password? Click here to reset