Implicit Temporal Differences

12/21/2014
by   Aviv Tamar, et al.
0

In reinforcement learning, the TD(λ) algorithm is a fundamental policy evaluation method with an efficient online implementation that is suitable for large-scale problems. One practical drawback of TD(λ) is its sensitivity to the choice of the step-size. It is an empirically well-known fact that a large step-size leads to fast convergence, at the cost of higher variance and risk of instability. In this work, we introduce the implicit TD(λ) algorithm which has the same function and computational cost as TD(λ), but is significantly more stable. We provide a theoretical explanation of this stability and an empirical evaluation of implicit TD(λ) on typical benchmark tasks. Our results show that implicit TD(λ) outperforms standard TD(λ) and a state-of-the-art method that automatically tunes the step-size, and thus shows promise for wide applicability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2021

Efficient adaptive step size control for exponential integrators

Traditional step size controllers make the tacit assumption that the cos...
research
03/15/2023

A numerically stable communication-avoiding s-step GMRES algorithm

Krylov subspace methods are extensively used in scientific computing to ...
research
07/01/2015

An Empirical Evaluation of True Online TD(λ)

The true online TD(λ) algorithm has recently been proposed (van Seijen a...
research
02/19/2021

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

We present an adaptive stochastic variance reduced method with an implic...
research
11/18/2019

WITCHcraft: Efficient PGD attacks with random step size

State-of-the-art adversarial attacks on neural networks use expensive it...
research
08/15/2019

Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures

Predictions and predictive knowledge have seen recent success in improvi...
research
12/18/2019

On theoretical upper limits for valid timesteps of implicit ODE methods

Implicit methods for the numerical solution of initial-value problems ma...

Please sign up or login with your details

Forgot password? Click here to reset