Optimal Transport for Offline Imitation Learning

03/24/2023
by   Yicheng Luo, et al.
0

With the advent of large datasets, offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment. However, offline RL requires the dataset to be reward-annotated, which presents practical challenges when reward engineering is difficult or when obtaining reward annotations is labor-intensive. In this paper, we introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories, with a few high-quality demonstrations. OTR's key idea is to use optimal transport to compute an optimal alignment between an unlabeled trajectory in the dataset and an expert demonstration to obtain a similarity measure that can be interpreted as a reward, which can then be used by an offline RL algorithm to learn the policy. OTR is easy to implement and computationally efficient. On D4RL benchmarks, we show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.

READ FULL TEXT
research
06/23/2023

CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn an optimal policy from...
research
03/30/2023

MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations

We study a new paradigm for sequential decision making, called offline P...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
research
09/12/2023

Risk-Aware Reinforcement Learning through Optimal Transport Theory

In the dynamic and uncertain environments where reinforcement learning (...
research
05/15/2023

An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions

Apprenticeship learning (AL) is a process of inducing effective decision...
research
06/30/2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Imitation learning holds tremendous promise in learning policies efficie...
research
02/09/2022

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

A major challenge in real-world reinforcement learning (RL) is the spars...

Please sign up or login with your details

Forgot password? Click here to reset