How to Leverage Unlabeled Data in Offline Reinforcement Learning

02/03/2022
by   Tianhe Yu, et al.
24

Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. In many cases, labeling large datasets with rewards may be costly, especially if those rewards must be provided by human labelers, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. While this approach might seem strange (and incorrect) at first, we provide extensive theoretical and empirical analysis that illustrates how it trades off reward bias, sample complexity and distributional shift, often leading to good results. We characterize conditions under which this simple strategy is effective, and further show that extending it with a simple reweighting approach can further alleviate the bias introduced by using incorrect reward labels. Our empirical evaluation confirms these findings in simulated robotic locomotion, navigation, and manipulation settings.

READ FULL TEXT

page 7

page 20

research
11/27/2020

Offline Learning from Demonstrations and Unlabeled Experience

Behavior cloning (BC) is often practical for robot learning because it a...
research
02/27/2023

The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Self-supervised methods have become crucial for advancing deep learning ...
research
07/13/2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via ...
research
10/24/2021

Understanding the World Through Action

The recent history of machine learning research has taught us that machi...
research
12/12/2020

Semi-supervised reward learning for offline reinforcement learning

In offline reinforcement learning (RL) agents are trained using a logged...
research
06/05/2023

Survival Instinct in Offline Reinforcement Learning

We present a novel observation about the behavior of offline reinforceme...
research
10/18/2022

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Reinforcement learning provides an automated framework for learning beha...

Please sign up or login with your details

Forgot password? Click here to reset