Deep Black-Box Reinforcement Learning with Movement Primitives

10/18/2022
by   Fabian Otto, et al.
5

-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly for each state using convex optimization, which enables policies learning with the high precision required for the ERL. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks. In doing so, we investigate different reward formulations - dense, sparse, and non-Markovian. While step-based algorithms perform well only on dense rewards, ERL performs favorably on sparse and non-Markovian rewards. Moreover, our results show that the sparse and the non-Markovian rewards are also often better suited to define the desired behavior, allowing us to obtain considerably higher quality policies compared to step-based RL.

READ FULL TEXT
research
06/22/2023

MP3: Movement Primitive-Based (Re-)Planning Policy

We introduce a novel deep reinforcement learning (RL) approach called Mo...
research
10/08/2020

Learning Intrinsic Symbolic Rewards in Reinforcement Learning

Learning effective policies for sparse objectives is a key challenge in ...
research
10/09/2019

Stochastic Implicit Natural Gradient for Black-box Optimization

Black-box optimization is primarily important for many compute-intensive...
research
11/09/2020

Reward Conditioned Neural Movement Primitives for Population Based Variational Policy Optimization

The aim of this paper is to study the reward based policy exploration pr...
research
09/30/2019

Efficient meta reinforcement learning via meta goal generation

Meta reinforcement learning (meta-RL) is able to accelerate the acquisit...
research
03/09/2022

Dimensionality Reduction and Prioritized Exploration for Policy Search

Black-box policy optimization is a class of reinforcement learning algor...
research
09/27/2021

Learning of Parameters in Behavior Trees for Movement Skills

Reinforcement Learning (RL) is a powerful mathematical framework that al...

Please sign up or login with your details

Forgot password? Click here to reset