Direct Gradient Temporal Difference Learning

08/02/2023
by   Xiaochi Qian, et al.
0

Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximation and bootstrapping, two arguably indispensable ingredients for large-scale reinforcement learning. This is the notorious deadly triad. Gradient Temporal Difference (GTD) is one powerful tool to solve the deadly triad. Its success results from solving a doubling sampling issue indirectly with weight duplication or Fenchel duality. In this paper, we instead propose a direct method to solve the double sampling issue by simply using two samples in a Markovian data stream with an increasing gap. The resulting algorithm is as computationally efficient as GTD but gets rid of GTD's extra weights. The only price we pay is a logarithmically increasing memory as time progresses. We provide both asymptotic and finite sample analysis, where the convergence rate is on-par with the canonical on-policy temporal difference learning. Key to our analysis is a novel refined discretization of limiting ODEs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

Backstepping Temporal Difference Learning

Off-policy learning ability is an important feature of reinforcement lea...
research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
06/18/2019

Gap-Increasing Policy Evaluation for Efficient and Noise-Tolerant Reinforcement Learning

In real-world applications of reinforcement learning (RL), noise from in...
research
08/11/2021

Truncated Emphatic Temporal Difference Methods for Prediction and Control

Emphatic Temporal Difference (TD) methods are a class of off-policy Rein...
research
02/15/2020

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Despite the wide applications of Adam in reinforcement learning (RL), th...
research
09/29/2020

Finite-Time Analysis for Double Q-learning

Although Q-learning is one of the most successful algorithms for finding...
research
04/24/2019

Target-Based Temporal Difference Learning

The use of target networks has been a popular and key component of recen...

Please sign up or login with your details

Forgot password? Click here to reset