CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

06/23/2023
by   Jinxin Liu, et al.
0

Offline reinforcement learning (RL) aims to learn an optimal policy from pre-collected and labeled datasets, which eliminates the time-consuming data collection in online RL. However, offline RL still bears a large burden of specifying/handcrafting extrinsic rewards for each transition in the offline data. As a remedy for the labor-intensive labeling, we propose to endow offline RL tasks with a few expert data and utilize the limited expert data to drive intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve that, we introduce Calibrated Latent gUidancE (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space such that intrinsic rewards can be directly qualified over the latent space. CLUE's key idea is to align the intrinsic rewards consistent with the expert intention via enforcing the embeddings of expert data to a calibrated contextual representation. We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks. Empirically, we find that CLUE can effectively improve the sparse-reward offline RL performance, outperform the state-of-the-art offline IL baselines, and discover diverse skills from static reward-free offline data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2023

Optimal Transport for Offline Imitation Learning

With the advent of large datasets, offline reinforcement learning (RL) i...
research
09/12/2023

Reasoning with Latent Diffusion in Offline Reinforcement Learning

Offline reinforcement learning (RL) holds promise as a means to learn hi...
research
02/03/2023

Mind the Gap: Offline Policy Optimization for Imperfect Rewards

Reward function is essential in reinforcement learning (RL), serving as ...
research
02/09/2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the spars...
research
06/22/2023

Learning from Visual Observation via Offline Pretrained State-to-Go Transformer

Learning from visual observation (LfVO), aiming at recovering policies f...
research
08/01/2019

Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation

Visual paragraph generation aims to automatically describe a given image...
research
03/22/2021

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Offline (or batch) reinforcement learning (RL) algorithms seek to learn ...

Please sign up or login with your details

Forgot password? Click here to reset