Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

06/24/2022
by   Yunfei Li, et al.
0

It has been a recent trend to leverage the power of supervised learning (SL) towards more effective reinforcement learning (RL) methods. We propose a novel phasic approach by alternating online RL and offline SL for tackling sparse-reward goal-conditioned problems. In the online phase, we perform RL training and collect rollout data while in the offline phase, we perform SL on those successful trajectories from the dataset. To further improve sample efficiency, we adopt additional techniques in the online phase including task reduction to generate more feasible trajectories and a value-difference-based intrinsic reward to alleviate the sparse-reward issue. We call this overall algorithm, PhAsic self-Imitative Reduction (PAIR). PAIR substantially outperforms both non-phasic RL and phasic SL baselines on sparse-reward goal-conditioned robotic control problems, including a challenging stacking task. PAIR is the first RL method that learns to stack 6 cubes with only 0/1 success rewards from scratch.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

research
04/18/2023

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

One of the key challenges of Reinforcement Learning (RL) is the ability ...
research
06/13/2020

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

We propose a graphical model framework for goal-conditioned RL, with an ...
research
07/16/2023

Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is an interesting extension...
research
01/05/2023

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Developing agents that can execute multiple skills by learning from pre-...
research
09/30/2021

Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Recent advances in reinforcement learning (RL) have led to a growing int...
research
06/20/2022

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Reinforcement learning (RL) in long horizon and sparse reward tasks is n...
research
05/18/2023

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

The difficulty of appropriately assigning credit is particularly heighte...

Please sign up or login with your details

Forgot password? Click here to reset