Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

08/02/2023
by   Haorui Li, et al.
0

Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.

READ FULL TEXT
research
03/26/2023

Exploring Novel Quality Diversity Methods For Generalization in Reinforcement Learning

The Reinforcement Learning field is strong on achievements and weak on r...
research
06/28/2018

Hierarchical Reinforcement Learning with Abductive Planning

One of the key challenges in applying reinforcement learning to real-lif...
research
06/04/2020

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO...
research
03/09/2019

Orthogonal Estimation of Wasserstein Distances

Wasserstein distances are increasingly used in a wide variety of applica...
research
08/01/2023

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

Evolutionary reinforcement learning (ERL) algorithms recently raise atte...
research
06/01/2016

Self-Paced Learning: an Implicit Regularization Perspective

Self-paced learning (SPL) mimics the cognitive mechanism of humans and a...
research
11/26/2020

Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning

Prioritized experience replay (PER) samples important transitions, rathe...

Please sign up or login with your details

Forgot password? Click here to reset