Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

05/24/2022
by   Maximilian Hüttenrauch, et al.
0

Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy gradients and compatible function approximation, and is using information theoretic constraints to ensure the stability of the policy update. While MORE does not suffer from the listed limitations, it often cannot achieve state of the art performance in comparison to ranking based methods. We improve MORE by decoupling the update of the mean and covariance of the search distribution allowing for more aggressive updates on the mean while keeping the update on the covariance conservative, an improved entropy scheduling technique based on an evolution path which results in faster convergence and a simplified and more effective model learning approach in comparison to the original paper. We compare our algorithm to state of the art black-box optimization algorithms on standard optimization tasks as well as on episodic RL tasks in robotics where it is also crucial to have small regret. We obtain competitive results on benchmark functions and clearly outperform ranking-based methods in terms of regret on the RL tasks.

READ FULL TEXT
research
08/22/2022

Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

Evolution Strategy (ES) is a powerful black-box optimization technique b...
research
02/07/2019

Compatible Natural Gradient Policy Search

Trust-region methods have yielded state-of-the-art results in policy sea...
research
10/09/2019

Stochastic Implicit Natural Gradient for Black-box Optimization

Black-box optimization is primarily important for many compute-intensive...
research
03/10/2017

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

We explore the use of Evolution Strategies (ES), a class of black box op...
research
03/09/2022

Dimensionality Reduction and Prioritized Exploration for Policy Search

Black-box policy optimization is a class of reinforcement learning algor...
research
04/28/2022

Probabilistic Permutation Graph Search: Black-Box Optimization for Fairness in Ranking

There are several measures for fairness in ranking, based on different u...
research
06/22/2011

Natural Evolution Strategies

This paper presents Natural Evolution Strategies (NES), a recent family ...

Please sign up or login with your details

Forgot password? Click here to reset