Designing an offline reinforcement learning objective from scratch

01/30/2023
by   Gaon An, et al.
0

Offline reinforcement learning has developed rapidly over the recent years, but estimating the actual performance of offline policies still remains a challenge. We propose a scoring metric for offline policies that highly correlates with actual policy performance and can be directly used for offline policy optimization in a supervised manner. To achieve this, we leverage the contrastive learning framework to design a scoring metric that gives high scores to policies that imitate the actions yielding relatively high returns while avoiding those yielding relatively low returns. Our experiments show that 1) our scoring metric is able to more accurately rank offline policies and 2) the policies optimized using our metric show high performance on various offline reinforcement learning benchmarks. Notably, our algorithm has a much lower network capacity requirement for the policy network compared to other supervised learning-based methods and also does not need any additional networks such as a Q-network.

READ FULL TEXT
research
06/21/2022

Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning

We study offline meta-reinforcement learning, a practical reinforcement ...
research
07/03/2021

Supervised Off-Policy Ranking

Off-policy evaluation (OPE) leverages data generated by other policies t...
research
10/12/2022

A Unified Framework for Alternating Offline Model Training and Policy Learning

In offline model-based reinforcement learning (offline MBRL), we learn a...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
07/10/2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

In some applications of reinforcement learning, a dataset of pre-collect...
research
08/11/2023

Learning Control Policies for Variable Objectives from Offline Data

Offline reinforcement learning provides a viable approach to obtain adva...
research
06/27/2020

Overfitting and Optimization in Offline Policy Learning

We consider the task of policy learning from an offline dataset generate...

Please sign up or login with your details

Forgot password? Click here to reset