Parameter-free Gradient Temporal Difference Learning

05/10/2021
by   Andrew Jacobsen, et al.
0

Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a single stream of experience with which to evaluate a large number of possible courses of action, necessitating algorithms which can learn off-policy. However, the combination of off-policy learning with function approximation leads to divergence of temporal difference methods. Recent work into gradient-based temporal difference methods has promised a path to stability, but at the cost of expensive hyperparameter tuning. In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored. In this work, we combine these two lines of attack, deriving parameter-free, gradient-based temporal difference algorithms. Our algorithms run in linear time and achieve high-probability convergence guarantees matching those of GTD2 up to log factors. Our experiments demonstrate that our methods maintain high prediction performance relative to fully-tuned baselines, with no tuning whatsoever.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2017

Convergent Tree-Backup and Retrace with Function Approximation

Off-policy learning is key to scaling up reinforcement learning as it al...
research
10/26/2021

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different...
research
07/02/2016

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

One of the main obstacles to broad application of reinforcement learning...
research
05/21/2014

Off-Policy Shaping Ensembles in Reinforcement Learning

Recent advances of gradient temporal-difference methods allow to learn o...
research
07/01/2020

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learni...
research
01/31/2023

Toward Efficient Gradient-Based Value Estimation

Gradient-based methods for value estimation in reinforcement learning ha...
research
11/28/2016

Accelerated Gradient Temporal Difference Learning

The family of temporal difference (TD) methods span a spectrum from comp...

Please sign up or login with your details

Forgot password? Click here to reset