Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

03/28/2023
by   Haoran Xu, et al.
0

Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing Q-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed In-sample Learning paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the Implicit Value Regularization (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse Q-learning (SQL) and Exponential Q-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes.

READ FULL TEXT
research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
06/09/2022

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
11/15/2022

Offline Reinforcement Learning with Adaptive Behavior Regularization

Offline reinforcement learning (RL) defines a sample-efficient learning ...
research
10/19/2022

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

A promising paradigm for offline reinforcement learning (RL) is to const...
research
02/01/2023

Selective Uncertainty Propagation in Offline RL

We study the finite-horizon offline reinforcement learning (RL) problem....
research
07/12/2023

Budgeting Counterfactual for Offline RL

The main challenge of offline reinforcement learning, where data is limi...

Please sign up or login with your details

Forgot password? Click here to reset