An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task

06/02/2021
by   Sina Ghiassian, et al.
0

Off-policy prediction – learning the value function for one policy from data generated while following another policy – is one of the most challenging subproblems in reinforcement learning. This paper presents empirical results with eleven prominent off-policy learning algorithms that use linear function approximation: five Gradient-TD methods, two Emphatic-TD methods, Off-policy TD(λ), Vtrace, and versions of Tree Backup and ABQ modified to apply to a prediction setting. Our experiments used the Collision task, a small idealized off-policy problem analogous to that of an autonomous car trying to predict whether it will collide with an obstacle. We assessed the performance of the algorithms according to their learning rate, asymptotic error level, and sensitivity to step-size and bootstrapping parameters. By these measures, the eleven algorithms can be partially ordered on the Collision task. In the top tier, the two Emphatic-TD algorithms learned the fastest, reached the lowest errors, and were robust to parameter settings. In the middle tier, the five Gradient-TD algorithms and Off-policy TD(λ) were more sensitive to the bootstrapping parameter. The bottom tier comprised Vtrace, Tree Backup, and ABQ; these algorithms were no faster and had higher asymptotic error than the others. Our results are definitive for this task, though of course experiments with more tasks are needed before an overall assessment of the algorithms' merits can be made.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment

Many off-policy prediction learning algorithms have been proposed in the...
research
09/21/2018

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

In reinforcement learning (RL) , one of the key components is policy eva...
research
10/03/2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

Widely-used deep reinforcement learning algorithms have been shown to fa...
research
06/25/2019

Expected Sarsa(λ) with Control Variate for Variance Reduction

Off-policy learning is powerful for reinforcement learning. However, the...
research
11/06/2018

Online Off-policy Prediction

This paper investigates the problem of online prediction learning, where...
research
01/22/2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target

Multi-step methods such as Retrace(λ) and n-step Q-learning have become ...
research
10/05/2016

ℓ_1 Regularized Gradient Temporal-Difference Learning

In this paper, we study the Temporal Difference (TD) learning with linea...

Please sign up or login with your details

Forgot password? Click here to reset