How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression

06/07/2022
by   Yecheng Jason Ma, et al.
27

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose Goal-conditioned f-Advantage Regression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a state-occupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains. Through extensive experiments, we validate GoFAR's effectiveness in various problem settings and tasks, significantly outperforming prior state-of-art. Notably, on a real robotic dexterous manipulation task, while no other method makes meaningful progress, GoFAR acquires complex manipulation behavior that successfully accomplishes diverse goals.

READ FULL TEXT

page 8

page 22

page 24

page 27

page 28

research
02/17/2023

Swapped goal-conditioned offline reinforcement learning

Offline goal-conditioned reinforcement learning (GCRL) can be challengin...
research
02/07/2023

Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability

Goal-conditioned reinforcement learning (GCRL) refers to learning genera...
research
05/17/2022

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

General-purpose robots require diverse repertoires of behaviors to compl...
research
06/30/2022

On the Learning and Learnablity of Quasimetrics

Our world is full of asymmetries. Gravity and wind can make reaching a p...
research
10/18/2021

Discovering and Achieving Goals via World Models

How can artificial agents learn to solve many diverse tasks in complex v...
research
02/15/2023

Prioritized offline Goal-swapping Experience Replay

In goal-conditioned offline reinforcement learning, an agent learns from...
research
11/09/2022

Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

Deep Reinforcement Learning has been successfully applied to learn robot...

Please sign up or login with your details

Forgot password? Click here to reset