Supported Policy Optimization for Offline Reinforcement Learning

02/13/2022
by   Jialong Wu, et al.
0

Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization that constrains the policy to perform actions within the support set of the behavior policy. The elaborative designs of parameterization methods usually intrude into the policy networks, which may bring extra inference cost and cannot take full advantage of well-established online methods. Regularization methods reduce the divergence between the learned policy and the behavior policy, which may mismatch the inherent density-based definition of support set thereby failing to avoid the out-of-distribution actions effectively. This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint. SPOT adopts a VAE-based density estimator to explicitly model the support set of behavior policy and presents a simple but effective density-based regularization term, which can be plugged non-intrusively into off-the-shelf off-policy RL algorithms. On the standard benchmarks for offline RL, SPOT substantially outperforms state-of-the-art offline RL methods. Benefiting from the pluggable design, the offline pretrained models from SPOT can also be applied to perform online fine-tuning seamlessly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2023

Behavior Proximal Policy Optimization

Offline reinforcement learning (RL) is a challenging setting where exist...
research
10/02/2021

BRAC+: Improved Behavior Regularized Actor Critic for Offline Reinforcement Learning

Online interactions with the environment to collect data samples for tra...
research
10/17/2022

Boosting Offline Reinforcement Learning via Data Rebalancing

Offline reinforcement learning (RL) is challenged by the distributional ...
research
06/16/2023

Automatic Trade-off Adaptation in Offline RL

Recently, offline RL algorithms have been proposed that remain adaptive ...
research
01/29/2020

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

We present GradientDICE for estimating the density ratio between the sta...
research
03/14/2021

Offline Reinforcement Learning with Fisher Divergence Critic Regularization

Many modern approaches to offline Reinforcement Learning (RL) utilize be...
research
05/25/2023

PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning

Offline-to-online reinforcement learning (RL), by combining the benefits...

Please sign up or login with your details

Forgot password? Click here to reset