The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

02/27/2023
by   Hao Hu, et al.
0

Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods.

READ FULL TEXT
research
01/31/2022

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

Recent progress in deep learning has relied on access to large and diver...
research
02/03/2022

How to Leverage Unlabeled Data in Offline Reinforcement Learning

Offline reinforcement learning (RL) can learn control policies from stat...
research
06/08/2021

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

We propose to learn to distinguish reversible from irreversible actions ...
research
10/12/2022

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories

Natural agents can effectively learn from multiple data sources that dif...
research
04/14/2023

Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

This paper studies reward-agnostic exploration in reinforcement learning...
research
08/08/2023

BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning

This paper introduces BarlowRL, a data-efficient reinforcement learning ...
research
08/25/2022

Light-weight probing of unsupervised representations for Reinforcement Learning

Unsupervised visual representation learning offers the opportunity to le...

Please sign up or login with your details

Forgot password? Click here to reset