Improving Sample Efficiency in Evolutionary RL Using Off-Policy Ranking

08/22/2022
by   Eshwar S R, et al.
7

Evolution Strategy (ES) is a powerful black-box optimization technique based on the idea of natural evolution. In each of its iterations, a key step entails ranking candidate solutions based on some fitness score. For an ES method in Reinforcement Learning (RL), this ranking step requires evaluating multiple policies. This is presently done via on-policy approaches: each policy's score is estimated by interacting several times with the environment using that policy. This leads to a lot of wasteful interactions since, once the ranking is done, only the data associated with the top-ranked policies is used for subsequent learning. To improve sample efficiency, we propose a novel off-policy alternative for ranking, based on a local approximation for the fitness function. We demonstrate our idea in the context of a state-of-the-art ES method called the Augmented Random Search (ARS). Simulations in MuJoCo tasks show that, compared to the original ARS, our off-policy variant has similar running times for reaching reward thresholds but needs only around 70 data. It also outperforms the recent Trust Region ES. We believe our ideas should be extendable to other ES methods as well.

READ FULL TEXT
research
11/16/2021

Causal policy ranking

Policies trained via reinforcement learning (RL) are often very complex ...
research
05/24/2022

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

Most successful stochastic black-box optimizers, such as CMA-ES, use ran...
research
10/26/2022

ERL-Re^2: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithm (EA) ar...
research
08/31/2020

Ranking Policy Decisions

Policies trained via Reinforcement Learning (RL) are often needlessly co...
research
07/03/2021

Supervised Off-Policy Ranking

Off-policy evaluation (OPE) leverages data generated by other policies t...
research
03/10/2017

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

We explore the use of Evolution Strategies (ES), a class of black box op...
research
03/07/2019

When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies

Interest in derivative-free optimization (DFO) and "evolutionary strateg...

Please sign up or login with your details

Forgot password? Click here to reset