Stable and Efficient Policy Evaluation

06/06/2020
by   Daoming Lyu, et al.
12

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy. However, there are two long-standing issues lying in this prediction problem that need to be tackled: off-policy stability and on-policy efficiency. The conventional temporal difference (TD) algorithm is known to perform very well in the on-policy setting, yet is not off-policy stable. On the other hand, the gradient TD and emphatic TD algorithms are off-policy stable, but are not on-policy efficient. This paper introduces novel algorithms that are both off-policy stable and on-policy efficient by using the oblique projection method. The empirical experimental results on various domains validate the effectiveness of the proposed approach.

READ FULL TEXT

page 7

page 8

page 10

research
09/14/2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement...
research
05/25/2017

Convergent Tree-Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it al...
research
03/01/2019

Should All Temporal Difference Learning Use Emphasis?

Emphatic Temporal Difference (ETD) learning has recently been proposed a...
research
01/27/2019

Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift

In this paper we revisit the method of off-policy corrections for reinfo...
research
02/14/2019

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

Off-policy Temporal Difference (TD) learning methods, when combined with...
research
11/15/2019

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Off-policy policy evaluation (OPE) is the problem of estimating the onli...
research
05/11/2017

A First Empirical Study of Emphatic Temporal Difference Learning

In this paper we present the first empirical study of the emphatic tempo...

Please sign up or login with your details

Forgot password? Click here to reset