Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

05/23/2019
by   Domingo Esteban, et al.
0

A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their respective tasks. Supplementary videos and code are available at https://sites.google.com/view/hrl-concurrent-discovery .

READ FULL TEXT

page 1

page 6

research
05/25/2019

Composing Ensembles of Policies with Deep Reinforcement Learning

Composition of elementary skills into complex behaviors to solve challen...
research
07/12/2018

Will it Blend? Composing Value Functions in Reinforcement Learning

An important property for lifelong-learning agents is the ability to com...
research
11/03/2019

Maximum Entropy Diverse Exploration: Disentangling Maximum Entropy Reinforcement Learning

Two hitherto disconnected threads of research, diverse exploration (DE) ...
research
02/14/2020

Learning Functionally Decomposed Hierarchies for Continuous Control Tasks

Solving long-horizon sequential decision making tasks in environments wi...
research
12/10/2019

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

Robotic reinforcement learning (RL) holds the promise of enabling robots...
research
07/07/2020

Skeptic: Automatic, Justified and Privacy-Preserving Password Composition Policy Selection

The choice of password composition policy to enforce on a password-prote...
research
01/06/2020

Learning Reusable Options for Multi-Task Reinforcement Learning

Reinforcement learning (RL) has become an increasingly active area of re...

Please sign up or login with your details

Forgot password? Click here to reset