DeepAI AI Chat
Log In Sign Up

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories

10/12/2022
by   Qinqing Zheng, et al.
0

Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action, reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop a simple meta-algorithmic pipeline that learns an inverse-dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful – on several D4RL benchmarks <cit.>, certain offline RL algorithms can match the performance of variants trained on a fully labeled dataset even when we label only 10% trajectories from the low return regime. Finally, we perform a large-scale controlled empirical study investigating the interplay of data-centric properties of the labelled and unlabelled datasets, with algorithmic design choices (e.g., inverse dynamics, offline RL algorithm) to identify general trends and best practices for training RL agents on semi-supervised offline datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

01/31/2022

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Recent progress in deep learning has relied on access to large and diver...
02/27/2023

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Self-supervised methods have become crucial for advancing deep learning ...
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
10/09/2022

State Advantage Weighting for Offline RL

We present state advantage weighting for offline reinforcement learning ...
07/26/2022

Offline Reinforcement Learning at Multiple Frequencies

Leveraging many sources of offline robot data requires grappling with th...
01/30/2023

Winning Solution of Real Robot Challenge III

This report introduces our winning solution of the real-robot phase of t...
10/31/2022

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

Learning to control an agent from data collected offline in a rich pixel...