Dimensionality Reduction and Prioritized Exploration for Policy Search

03/09/2022
by   Marius Memmel, et al.
5

Black-box policy optimization is a class of reinforcement learning algorithms that explores and updates the policies at the parameter level. This class of algorithms is widely applied in robotics with movement primitives or non-differentiable policies. Furthermore, these approaches are particularly relevant where exploration at the action level could cause actuator damage or other safety issues. However, Black-box optimization does not scale well with the increasing dimensionality of the policy, leading to high demand for samples, which are expensive to obtain in real-world systems. In many practical applications, policy parameters do not contribute equally to the return. Identifying the most relevant parameters allows to narrow down the exploration and speed up the learning. Furthermore, updating only the effective parameters requires fewer samples, improving the scalability of the method. We present a novel method to prioritize the exploration of effective parameters and cope with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results. To select the effective parameters, we consider both the Pearson correlation coefficient and the Mutual Information. We showcase the capabilities of our approach on the Relative Entropy Policy Search algorithm in several simulated environments, including robotics simulations. Code is available at https://git.ias.informatik.tu-darmstadt.de/ias_code/aistats2022/dr-crepsgit.ias.informatik.tu-darmstadt.de/ias_code/aistats2022/dr-creps.

READ FULL TEXT
research
02/26/2020

Dimensionality Reduction of Movement Primitives in Parameter Space

Movement primitives are an important policy class for real-world robotic...
research
09/20/2017

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

The most data-efficient algorithms for reinforcement learning in robotic...
research
01/06/2022

SABLAS: Learning Safe Control for Black-box Dynamical Systems

Control certificates based on barrier functions have been a powerful too...
research
05/24/2022

Regret-Aware Black-Box Optimization with Natural Gradients, Trust-Regions and Entropy Control

Most successful stochastic black-box optimizers, such as CMA-ES, use ran...
research
10/18/2022

Deep Black-Box Reinforcement Learning with Movement Primitives

-based reinforcement learning (ERL) algorithms treat reinforcement learn...
research
06/10/2021

Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

Today's advanced Reinforcement Learning algorithms produce black-box pol...
research
05/15/2017

Probabilistically Safe Policy Transfer

Although learning-based methods have great potential for robotics, one c...

Please sign up or login with your details

Forgot password? Click here to reset