Learning Functionally Decomposed Hierarchies for Continuous Control Tasks

02/14/2020
by   Lukas Jendele, et al.
7

Solving long-horizon sequential decision making tasks in environments with sparse rewards is a longstanding problem in reinforcement learning (RL) research. Hierarchical Reinforcement Learning (HRL) has held the promise to enhance the capabilities of RL agents via operation on different levels of temporal abstraction. Despite the success of recent works in dealing with inherent nonstationarity and sample complexity, it remains difficult to generalize to unseen environments and to transfer different layers of the policy to other agents. In this paper, we propose a novel HRL architecture, Hierarchical Decompositional Reinforcement Learning (HiDe), which allows decomposition of the hierarchical layers into independent subtasks, yet allows for joint training of all layers in end-to-end manner. The main insight is to combine a control policy on a lower level with an image-based planning policy on a higher level. We evaluate our method on various complex continuous control tasks, demonstrating that generalization across environments and transfer of higher level policies, such as from a simple ball to a complex humanoid, can be achieved. See videos https://sites.google.com/view/hide-rl.

READ FULL TEXT
research
06/13/2019

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning is a promising approach to long-hori...
research
05/21/2018

Hierarchical Reinforcement Learning with Hindsight

Reinforcement Learning (RL) algorithms can suffer from poor sample effic...
research
11/11/2022

Emergency action termination for immediate reaction in hierarchical reinforcement learning

Hierarchical decomposition of control is unavoidable in large dynamical ...
research
01/04/2020

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

How do people decide how long to continue in a task, when to switch, and...
research
10/09/2019

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Humans are masters at quickly learning many complex tasks, relying on an...
research
05/23/2019

Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

A common strategy to deal with the expensive reinforcement learning (RL)...
research
11/24/2020

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Multi-goal reaching is an important problem in reinforcement learning ne...

Please sign up or login with your details

Forgot password? Click here to reset