Adjacency constraint for efficient hierarchical reinforcement learning

10/30/2021
by   Tianren Zhang, et al.
6

Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a k-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.

READ FULL TEXT

page 9

page 13

page 17

research
06/20/2020

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) is a promisin...
research
06/30/2023

Landmark Guided Active Exploration with Stable Low-level Policy Learning

Goal-conditioned hierarchical reinforcement learning (GCHRL) decomposes ...
research
10/26/2021

Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning

Goal-conditioned hierarchical reinforcement learning (HRL) has shown pro...
research
06/30/2020

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

This paper introduces MDP homomorphic networks for deep reinforcement le...
research
09/27/2020

Scalable Deep Reinforcement Learning for Ride-Hailing

Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange tho...
research
07/01/2021

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Goal-conditioned reinforcement learning endows an agent with a large var...
research
09/09/2023

Verifiable Reinforcement Learning Systems via Compositionality

We propose a framework for verifiable and compositional reinforcement le...

Please sign up or login with your details

Forgot password? Click here to reset