DCE: Offline Reinforcement Learning With Double Conservative Estimates

09/27/2022
by   Chen Zhao, et al.
5

Offline Reinforcement Learning has attracted much interest in solving the application challenge for traditional reinforcement learning. Offline reinforcement learning uses previously-collected datasets to train agents without any interaction. For addressing the overestimation of OOD (out-of-distribution) actions, conservative estimates give a low value for all inputs. Previous conservative estimation methods are usually difficult to avoid the impact of OOD actions on Q-value estimates. In addition, these algorithms usually need to lose some computational efficiency to achieve the purpose of conservative estimation. In this paper, we propose a simple conservative estimation method, double conservative estimates (DCE), which use two conservative estimation method to constraint policy. Our algorithm introduces V-function to avoid the error of in-distribution action while implicit achieving conservative estimation. In addition, our algorithm uses a controllable penalty term changing the degree of conservatism in training. We theoretically show how this method influences the estimation of OOD actions and in-distribution actions. Our experiment separately shows that two conservative estimation methods impact the estimation of all state-action. DCE demonstrates the state-of-the-art performance on D4RL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/14/2023

Conservative State Value Estimation for Offline Reinforcement Learning

Offline reinforcement learning faces a significant challenge of value ov...
research
06/09/2022

Mildly Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
09/15/2019

Biased Estimates of Advantages over Path Ensembles

The estimation of advantage is crucial for a number of reinforcement lea...
research
01/03/2023

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline dat...
research
06/02/2022

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

Most theoretically motivated work in the offline reinforcement learning ...
research
06/30/2019

Reinforcement Learning for Robotic Time-optimal Path Tracking Using Prior Knowledge

Time-optimal path tracking, as a significant tool for industrial robots,...
research
08/22/2023

Careful at Estimation and Bold at Exploration

Exploration strategies in continuous action space are often heuristic du...

Please sign up or login with your details

Forgot password? Click here to reset