Mildly Conservative Q-Learning for Offline Reinforcement Learning

06/09/2022
by   Jiafei Lyu, et al.
0

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values. We theoretically show that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OOD actions. Experimental results on the D4RL benchmarks demonstrate that MCQ achieves remarkable performance compared with prior work. Furthermore, MCQ shows superior generalization ability when transferring from offline to online, and significantly outperforms baselines.

READ FULL TEXT
research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
01/03/2023

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline dat...
research
09/27/2022

DCE: Offline Reinforcement Learning With Double Conservative Estimates

Offline Reinforcement Learning has attracted much interest in solving th...
research
10/14/2022

Mutual Information Regularized Offline Reinforcement Learning

Offline reinforcement learning (RL) aims at learning an effective policy...
research
07/25/2023

Offline Reinforcement Learning with On-Policy Q-Function Regularization

The core challenge of offline reinforcement learning (RL) is dealing wit...
research
03/28/2023

Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization

Most offline reinforcement learning (RL) methods suffer from the trade-o...
research
07/12/2023

Budgeting Counterfactual for Offline RL

The main challenge of offline reinforcement learning, where data is limi...

Please sign up or login with your details

Forgot password? Click here to reset