Accelerated Gradient Temporal Difference Learning

11/28/2016
by   Yangchen Pan, et al.
0

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD(λ) to data efficient least squares methods. Least square methods make the best use of available data directly computing the TD solution and thus do not require tuning a typically highly sensitive learning rate parameter, but require quadratic computation and storage. Recent algorithmic developments have yielded several sub-quadratic methods that use an approximation to the least squares TD solution, but incur bias. In this paper, we propose a new family of accelerated gradient TD (ATD) methods that (1) provide similar data efficiency benefits to least-squares methods, at a fraction of the computation and storage (2) significantly reduce parameter sensitivity compared to linear TD methods, and (3) are asymptotically unbiased. We illustrate these claims with a proof of convergence in expectation and experiments on several benchmark domains and a large-scale industrial energy allocation domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/26/2015

Incremental Truncated LSTD

Balancing between computational efficiency and sample efficiency is an i...
research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
04/18/2021

Linear shrinkage for predicting responses in large-scale multivariate linear regression

We propose a new prediction method for multivariate linear regression pr...
research
11/14/2019

Supplementary material for Uncorrected least-squares temporal difference with lambda-return

Here, we provide a supplementary material for Takayuki Osogami, "Uncorre...
research
05/10/2021

Parameter-free Gradient Temporal Difference Learning

Reinforcement learning lies at the intersection of several challenges. M...
research
07/02/2016

A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning

One of the main obstacles to broad application of reinforcement learning...
research
12/31/2022

Efficient Methods for Approximating the Shapley Value for Asset Sharing in Energy Communities

With the emergence of energy communities, where a number of prosumers in...

Please sign up or login with your details

Forgot password? Click here to reset