Contextual Conservative Q-Learning for Offline Reinforcement Learning

01/03/2023
by   Ke Jiang, et al.
0

Offline reinforcement learning learns an effective policy on offline datasets without online interaction, and it attracts persistent research attention due to its potential of practical application. However, extrapolation error generated by distribution shift will still lead to the overestimation for those actions that transit to out-of-distribution(OOD) states, which degrades the reliability and robustness of the offline policy. In this paper, we propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model. With the supervision of the inverse dynamics model, it tends to learn a policy that generates stable transition at perturbed states, for the fact that pertuebed states are a common kind of OOD states. In this manner, we enable the learnt policy more likely to generate transition that destines to the empirical next state distributions of the offline dataset, i.e., robustly reliable transition. Besides, we theoretically reveal that C-CQL is the generalization of the Conservative Q-Learning(CQL) and aggressive State Deviation Correction(SDC). Finally, experimental results demonstrate the proposed C-CQL achieves the state-of-the-art performance in most environments of offline Mujoco suite and a noisy Mujoco setting.

READ FULL TEXT
research
06/09/2022

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
08/09/2023

Bayesian Inverse Transition Learning for Offline Settings

Offline Reinforcement learning is commonly used for sequential decision-...
research
10/01/2021

Offline Reinforcement Learning with Reverse Model-based Imagination

In offline reinforcement learning (offline RL), one of the main challeng...
research
09/27/2022

DCE: Offline Reinforcement Learning With Double Conservative Estimates

Offline Reinforcement Learning has attracted much interest in solving th...
research
02/14/2023

Conservative State Value Estimation for Offline Reinforcement Learning

Offline reinforcement learning faces a significant challenge of value ov...
research
03/17/2023

Towards Safe Propofol Dosing during General Anesthesia Using Deep Offline Reinforcement Learning

Automated anesthesia promises to enable more precise and personalized an...
research
04/12/2022

Offline Distillation for Robot Lifelong Learning with Imbalanced Experience

Robots will experience non-stationary environment dynamics throughout th...

Please sign up or login with your details

Forgot password? Click here to reset