Log In Sign Up

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

by   Maximilian Hüttenrauch, et al.

Most successful stochastic black-box optimizers, such as CMA-ES, use rankings of the individual samples to obtain a new search distribution. Yet, the use of rankings also introduces several issues such as the underlying optimization objective is often unclear, i.e., we do not optimize the expected fitness. Further, while these algorithms typically produce a high-quality mean estimate of the search distribution, the produced samples can have poor quality as these algorithms are ignorant of the regret. Lastly, noisy fitness function evaluations may result in solutions that are highly sub-optimal on expectation. In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings. MORE can be derived by applying natural policy gradients and compatible function approximation, and is using information theoretic constraints to ensure the stability of the policy update. While MORE does not suffer from the listed limitations, it often cannot achieve state of the art performance in comparison to ranking based methods. We improve MORE by decoupling the update of the mean and covariance of the search distribution allowing for more aggressive updates on the mean while keeping the update on the covariance conservative, an improved entropy scheduling technique based on an evolution path which results in faster convergence and a simplified and more effective model learning approach in comparison to the original paper. We compare our algorithm to state of the art black-box optimization algorithms on standard optimization tasks as well as on episodic RL tasks in robotics where it is also crucial to have small regret. We obtain competitive results on benchmark functions and clearly outperform ranking-based methods in terms of regret on the RL tasks.


Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

Evolution Strategy (ES) is a powerful black-box optimization technique b...

Compatible Natural Gradient Policy Search

Trust-region methods have yielded state-of-the-art results in policy sea...

Stochastic Implicit Natural Gradient for Black-box Optimization

Black-box optimization is primarily important for many compute-intensive...

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

We explore the use of Evolution Strategies (ES), a class of black box op...

Dimensionality Reduction and Prioritized Exploration for Policy Search

Black-box policy optimization is a class of reinforcement learning algor...

Unbiased Black-Box Complexities of Jump Functions

We analyze the unbiased black-box complexity of jump functions with smal...