A Policy-Guided Imitation Approach for Offline Reinforcement Learning

10/15/2022
by   Haoran Xu, et al.
0

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the Prophet. By doing so, our algorithm allows state-compositionality from the dataset, rather than action-compositionality conducted in prior imitation-style methods. We dumb this new approach Policy-guided Offline RL (). demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL. We also highlight the benefits of in terms of improving with supplementary suboptimal data and easily adapting to new tasks by only changing the guide-poicy.

READ FULL TEXT

page 4

page 9

page 19

research
04/05/2022

Jump-Start Reinforcement Learning

Reinforcement learning (RL) provides a theoretical framework for continu...
research
11/03/2021

Curriculum Offline Imitation Learning

Offline reinforcement learning (RL) tasks require the agent to learn fro...
research
05/23/2022

Distance-Sensitive Offline Reinforcement Learning

In offline reinforcement learning (RL), one detrimental issue to policy ...
research
03/11/2023

User Retention-oriented Recommendation with Decision Transformer

Improving user retention with reinforcement learning (RL) has attracted ...
research
09/04/2023

Hundreds Guide Millions: Adaptive Offline Reinforcement Learning with Expert Guidance

Offline reinforcement learning (RL) optimizes the policy on a previously...
research
06/24/2023

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

Offline optimization paradigms such as offline Reinforcement Learning (R...
research
12/23/2022

Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information

Motivated by the human-machine interaction such as training chatbots for...

Please sign up or login with your details

Forgot password? Click here to reset