Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

06/06/2020
by   Bo Liu, et al.
0

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal “mirror maps” to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 21

page 22

05/20/2020

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Greedy-GQ is an off-policy two timescale algorithm for optimal control i...
11/20/2019

A Tale of Two-Timescale Reinforcement Learning with the Tightest Finite-Time Bound

Policy evaluation in reinforcement learning is often conducted using two...
10/16/2012

Sparse Q-learning with Mirror Descent

This paper explores a new framework for reinforcement learning based on ...
09/09/2021

Versions of Gradient Temporal Difference Learning

Sutton, Szepesvári and Maei introduced the first gradient temporal-diffe...
11/28/2016

Accelerated Gradient Temporal Difference Learning

The family of temporal difference (TD) methods span a spectrum from comp...
08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...
04/24/2019

Target-Based Temporal Difference Learning

The use of target networks has been a popular and key component of recen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.