Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning

06/04/2022
by   Xue-Kun Jin, et al.
0

Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which trades off bias and variance by balancing between the value estimation from offline data and the learned model. Theoretical analysis discloses that HVE enjoys a better error bound than the direct methods. HVE can be leveraged in both off-policy evaluation and offline reinforcement learning settings. We, therefore, provide two concrete algorithms Off-policy HVE (OPHVE) and Model-based Offline HVE (MOHVE), respectively. Empirical evaluations on MuJoCo tasks corroborate the theoretical claim. OPHVE outperforms other off-policy evaluation methods in all three metrics measuring the estimation effectiveness, while MOHVE achieves better or comparable performance with state-of-the-art offline reinforcement learning algorithms. We hope that HVE could shed some light on further research on reinforcement learning from fixed data.

READ FULL TEXT

page 7

page 9

page 20

page 21

page 22

research
01/26/2023

Model-based Offline Reinforcement Learning with Local Misspecification

We present a model-based offline reinforcement learning policy performan...
research
08/28/2018

High-confidence error estimates for learned value functions

Estimating the value function for a fixed policy is a fundamental proble...
research
08/28/2023

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

Offline reinforcement learning aims to utilize datasets of previously ga...
research
11/10/2019

A Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
11/10/2019

Model-Based Reinforcement Learning with Adversarial Training for Online Recommendation

Reinforcement learning is effective in optimizing policies for recommend...
research
05/21/2021

On Instrumental Variable Regression for Deep Offline Policy Evaluation

We show that the popular reinforcement learning (RL) strategy of estimat...
research
11/18/2019

Gamma-Nets: Generalizing Value Estimation over Timescale

We present Γ-nets, a method for generalizing value function estimation o...

Please sign up or login with your details

Forgot password? Click here to reset