Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

05/17/2022
by   Kuan Fang, et al.
8

General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratch. In this paper, we propose Planning to Practice (PTP), a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve. Our approach is based on two key ideas. First, we decompose the goal-reaching problem hierarchically, with a high-level planner that sets intermediate subgoals using conditional subgoal generators in the latent space for a low-level model-free policy. Second, we propose a hybrid approach which first pre-trains both the conditional subgoal generator and the policy on previously collected data through offline reinforcement learning, and then fine-tunes the policy via online exploration. This fine-tuning process is itself facilitated by the planned subgoals, which breaks down the original target task into short-horizon goal-reaching tasks that are significantly easier to learn. We conduct experiments in both the simulation and real world, in which the policy is pre-trained on demonstrations of short primitive behaviors and fine-tuned for temporally extended tasks that are unseen in the offline data. Our experimental results show that PTP can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks.

READ FULL TEXT

page 1

page 5

page 6

page 7

research
05/24/2022

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Offline Reinforcement learning (RL) has shown potent in many safe-critic...
research
10/12/2022

Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

The utilization of broad datasets has proven to be crucial for generaliz...
research
06/08/2022

Deep Hierarchical Planning from Pixels

Intelligent agents need to select long sequences of actions to solve com...
research
10/11/2018

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Learning effective visuomotor policies for robots purely from data is ch...
research
10/03/2022

Obstacle Avoidance for Robotic Manipulator in Joint Space via Improved Proximal Policy Optimization

Reaching tasks with random targets and obstacles can still be challengin...
research
06/07/2022

How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

Offline goal-conditioned reinforcement learning (GCRL) promises general-...
research
11/13/2019

IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data

Learning from offline task demonstrations is a problem of great interest...

Please sign up or login with your details

Forgot password? Click here to reset