HAC Explore: Accelerating Exploration with Hierarchical Reinforcement Learning

08/12/2021
by   Willie McClinton, et al.
6

Sparse rewards and long time horizons remain challenging for reinforcement learning algorithms. Exploration bonuses can help in sparse reward settings by encouraging agents to explore the state space, while hierarchical approaches can assist with long-horizon tasks by decomposing lengthy tasks into shorter subtasks. We propose HAC Explore (HACx), a new method that combines these approaches by integrating the exploration bonus method Random Network Distillation (RND) into the hierarchical approach Hierarchical Actor-Critic (HAC). HACx outperforms either component method on its own, as well as an existing approach to combining hierarchy and exploration, in a set of difficult simulated robotics tasks. HACx is the first RL method to solve a sparse reward, continuous-control task that requires over 1,000 actions.

READ FULL TEXT

page 7

page 9

research
12/04/2017

Hierarchical Actor-Critic

We present a novel approach to hierarchical reinforcement learning calle...
research
11/12/2020

Hierarchical reinforcement learning for efficient exploration and transfer

Sparse-reward domains are challenging for reinforcement learning algorit...
research
11/30/2018

Modulated Policy Hierarchies

Solving tasks with sparse rewards is a main challenge in reinforcement l...
research
12/01/2021

Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation

Complex sequential tasks in continuous-control settings often require ag...
research
01/13/2023

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Very large state spaces with a sparse reward signal are difficult to exp...
research
05/07/2020

Curious Hierarchical Actor-Critic Reinforcement Learning

Hierarchical abstraction and curiosity-driven exploration are two common...
research
10/14/2022

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Providing densely shaped reward functions for RL algorithms is often exc...

Please sign up or login with your details

Forgot password? Click here to reset