HIQL: Offline Goal-Conditioned RL with Latent States as Actions

07/22/2023
by   Seohong Park, et al.
0

Unsupervised pre-training has recently become the bedrock for computer vision and natural language processing. In reinforcement learning (RL), goal-conditioned RL can potentially provide an analogous self-supervised approach for making use of large quantities of unlabeled (reward-free) data. However, building effective algorithms for goal-conditioned RL that can learn directly from diverse offline data is challenging, because it is hard to accurately estimate the exact value function for faraway goals. Nonetheless, goal-reaching problems exhibit structure, such that reaching distant goals entails first passing through closer subgoals. This structure can be very useful, as assessing the quality of actions for nearby goals is typically easier than for more distant goals. Based on this idea, we propose a hierarchical algorithm for goal-conditioned RL from offline data. Using one action-free value function, we learn two policies that allow us to exploit this structure: a high-level policy that treats states as actions and predicts (a latent representation of) a subgoal and a low-level policy that predicts the action for reaching this subgoal. Through analysis and didactic examples, we show how this hierarchical decomposition makes our method robust to noise in the estimated value function. We then apply our method to offline goal-reaching benchmarks, showing that our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data. Our code is available at https://seohong.me/projects/hiql/

READ FULL TEXT

page 8

page 24

page 25

research
05/24/2022

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

Offline Reinforcement learning (RL) has shown potent in many safe-critic...
research
04/03/2023

Optimal Goal-Reaching Reinforcement Learning via Quasimetric Learning

In goal-reaching reinforcement learning (RL), the optimal value function...
research
10/21/2014

Where do goals come from? A Generic Approach to Autonomous Goal-System Development

Goals express agents' intentions and allow them to organize their behavi...
research
08/15/2019

Mapping State Space using Landmarks for Universal Goal Reaching

An agent that has well understood the environment should be able to appl...
research
03/16/2023

Goal-conditioned Offline Reinforcement Learning through State Space Partitioning

Offline reinforcement learning (RL) aims to infer sequential decision po...
research
05/04/2022

State Representation Learning for Goal-Conditioned Reinforcement Learning

This paper presents a novel state representation for reward-free Markov ...
research
11/17/2020

C-Learning: Learning to Achieve Goals via Recursive Classification

We study the problem of predicting and controlling the future state dist...

Please sign up or login with your details

Forgot password? Click here to reset