SHIRO: Soft Hierarchical Reinforcement Learning

12/24/2022
by   Kandai Watanabe, et al.
0

Hierarchical Reinforcement Learning (HRL) algorithms have been demonstrated to perform well on high-dimensional decision making and robotic control tasks. However, because they solely optimize for rewards, the agent tends to search the same space redundantly. This problem reduces the speed of learning and achieved reward. In this work, we present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration. The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level. The novelty of this work is the theoretical motivation of adding entropy to the RL objective in the HRL setting. We empirically show that the entropy can be added to both levels if the Kullback-Leibler (KL) divergence between consecutive updates of the low-level policy is sufficiently small. We performed an ablative study to analyze the effects of entropy on hierarchy, in which adding entropy to high-level emerged as the most desirable configuration. Furthermore, a higher temperature in the low-level leads to Q-value overestimation and increases the stochasticity of the environment that the high-level operates on, making learning more challenging. Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks and requires minimal tuning.

READ FULL TEXT

page 6

page 7

page 16

research
01/24/2022

Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) proposes to solve difficult ta...
research
04/14/2019

Dot-to-Dot: Achieving Structured Robotic Manipulation through Hierarchical Reinforcement Learning

Robotic systems are ever more capable of automation and fulfilment of co...
research
12/08/2015

Reinforcement Control with Hierarchical Backpropagated Adaptive Critics

Present incremental learning methods are limited in the ability to achie...
research
05/13/2019

Learning and Exploiting Multiple Subgoals for Fast Exploration in Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporally extended a...
research
02/11/2023

Hierarchical control and learning of a foraging CyberOctopus

Inspired by the unique neurophysiology of the octopus, we propose a hier...
research
08/15/2022

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

Simulated humanoids are an appealing research domain due to their physic...
research
03/22/2019

Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction

The recommender system is an important form of intelligent application, ...

Please sign up or login with your details

Forgot password? Click here to reset