Target-Based Temporal Difference Learning

04/24/2019
by   Donghwan Lee, et al.
0

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms compared to the standard TD-learning. While this work focuses on linear function approximation and policy evaluation setting, we consider this as a meaningful step towards the theoretical understanding of deep Q-learning variants with target networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2020

Periodic Q-Learning

The use of target networks is a common practice in deep reinforcement le...
research
12/04/2019

A Unified Switching System Perspective and O.D.E. Analysis of Q-Learning Algorithms

In this paper, we introduce a unified framework for analyzing a large fa...
research
06/06/2020

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

In this paper, we introduce proximal gradient temporal difference learni...
research
02/24/2023

Why Target Networks Stabilise Temporal Difference Methods

Integral to recent successes in deep reinforcement learning has been a c...
research
01/21/2021

Breaking the Deadly Triad with a Target Network

The deadly triad refers to the instability of a reinforcement learning a...
research
09/09/2021

Versions of Gradient Temporal Difference Learning

Sutton, Szepesvári and Maei introduced the first gradient temporal-diffe...
research
08/02/2023

Direct Gradient Temporal Difference Learning

Off-policy learning enables a reinforcement learning (RL) agent to reaso...

Please sign up or login with your details

Forgot password? Click here to reset