Boosting Offline Reinforcement Learning with Action Preference Query

06/06/2023
by   Qisen Yang, et al.
0

Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs. In particular, online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase. However, even limited online interactions can be inaccessible or catastrophic for high-stake scenarios like healthcare and autonomous driving. In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP). The main insight is that, compared to online fine-tuning, querying the preferences between pre-collected and learned actions can be equally or even more helpful to the erroneous estimate problem. By adaptively encouraging or suppressing policy constraint according to action preferences, OAP could distinguish overestimation from beneficial policy improvement and thus attains a more accurate evaluation of unseen data. Theoretically, we prove a lower bound of the behavior policy's performance improvement brought by OAP. Moreover, comprehensive experiments on the D4RL benchmark and state-of-the-art algorithms demonstrate that OAP yields higher (29

READ FULL TEXT

page 1

page 8

research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
11/21/2022

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...
research
03/09/2023

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtai...
research
07/01/2021

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Recent advance in deep offline reinforcement learning (RL) has made it p...
research
03/13/2023

Deploying Offline Reinforcement Learning with Human Feedback

Reinforcement learning (RL) has shown promise for decision-making tasks ...
research
09/04/2023

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously...
research
03/26/2023

Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

Offline reinforcement learning agents seek optimal policies from fixed d...

Please sign up or login with your details

Forgot password? Click here to reset