Log In Sign Up

Efficient Hierarchical Exploration with Stable Subgoal Representation Learning

by   Siyuan Li, et al.

Goal-conditioned hierarchical reinforcement learning (HRL) serves as a successful approach to solving complex and temporally extended tasks. Recently, its success has been extended to more general settings by concurrently learning hierarchical policies and subgoal representations. However, online subgoal representation learning exacerbates the non-stationary issue of HRL and introduces challenges for exploration in high-level policy learning. In this paper, we propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas while allowing representation updates in less explored state regions. Benefiting from this stable representation, we design measures of novelty and potential for subgoals, and develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states. Experimental results show that our method significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards and further demonstrate the stability and efficiency of the subgoal representation learning of this work, which promotes superior policy learning.


Directed Exploration for Reinforcement Learning

Efficient exploration is necessary to achieve good sample efficiency for...

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

We study the problem of representation learning in goal-conditioned hier...

Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) has shown pro...

Learning Actionable Representations with Goal-Conditioned Policies

Representation learning is a central challenge across a range of machine...

Modulated Policy Hierarchies

Solving tasks with sparse rewards is a main challenge in reinforcement l...

Exploratory State Representation Learning

Not having access to compact and meaningful representations is known to ...

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

We study the problem of learning goal-conditioned policies in Minecraft,...