Adaptive Temporal Difference Learning with Linear Function Approximation

02/20/2020
by   Tao Sun, et al.
15

This paper revisits the celebrated temporal difference (TD) learning algorithm for the policy evaluation in reinforcement learning. Typically, the performance of the plain-vanilla TD algorithm is sensitive to the choice of stepsizes. Oftentimes, TD suffers from slow convergence. Motivated by the tight connection between the TD learning algorithm and the stochastic gradient methods, we develop the first adaptive variant of the TD learning algorithm with linear function approximation that we term AdaTD. In contrast to the original TD, AdaTD is robust or less sensitive to the choice of stepsizes. Analytically, we establish that to reach an ϵ accuracy, the number of iterations needed is Õ(ϵ^2ln^41/ϵ/ln^41/ρ), where ρ represents the speed of the underlying Markov chain converges to the stationary distribution. This implies that the iteration complexity of AdaTD is no worse than that of TD in the worst case. Going beyond TD, we further develop an adaptive variant of TD(λ), which is referred to as AdaTD(λ). We evaluate the empirical performance of AdaTD and AdaTD(λ) on several standard reinforcement learning tasks in OpenAI Gym on both linear and nonlinear function approximation, which demonstrate the effectiveness of our new approaches over existing ones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2014

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

We provide non-asymptotic bounds for the well-known temporal difference ...
research
02/11/2022

Regularized Q-learning

Q-learning is widely used algorithm in reinforcement learning community....
research
09/29/2021

Online Robust Reinforcement Learning with Model Uncertainty

Robust reinforcement learning (RL) is to find a policy that optimizes th...
research
02/15/2020

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), th...
research
07/16/2021

Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

This paper studies the adaptive optimal stationary control of continuous...
research
01/11/2023

An Analysis of Quantile Temporal-Difference Learning

We analyse quantile temporal-difference learning (QTD), a distributional...
research
11/10/2022

When is Realizability Sufficient for Off-Policy Reinforcement Learning?

Model-free algorithms for reinforcement learning typically require a con...

Please sign up or login with your details

Forgot password? Click here to reset