Backstepping Temporal Difference Learning

02/20/2023
by   Han-Dong Lim, et al.
0

Off-policy learning ability is an important feature of reinforcement learning (RL) for practical applications. However, even one of the most elementary RL algorithms, temporal-difference (TD) learning, is known to suffer form divergence issue when the off-policy scheme is used together with linear function approximation. To overcome the divergent behavior, several off-policy TD-learning algorithms, including gradient-TD learning (GTD), and TD-learning with correction (TDC), have been developed until now. In this work, we provide a unified view of such algorithms from a purely control-theoretic perspective, and propose a new convergent algorithm. Our method relies on the backstepping technique, which is widely used in nonlinear control theory. Finally, convergence of the proposed algorithm is experimentally verified in environments where the standard TD-learning is known to be unstable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2019

A Convergent Off-Policy Temporal Difference Algorithm

Learning the value function of a given policy (target policy) from the d...
research
08/02/2023

Direct Gradient Temporal Difference Learning

Off-policy learning enables a reinforcement learning (RL) agent to reaso...
research
03/21/2019

Towards Characterizing Divergence in Deep Q-Learning

Deep Q-Learning (DQL), a family of temporal difference algorithms for co...
research
02/28/2016

Investigating practical linear temporal difference learning

Off-policy reinforcement learning has many applications including: learn...
research
08/18/2023

Baird Counterexample Is Solved: with an example of How to Debug a Two-time-scale Algorithm

Baird counterexample was proposed by Leemon Baird in 1995, first used to...
research
05/07/2019

A Complementary Learning Systems Approach to Temporal Difference Learning

Complementary Learning Systems (CLS) theory suggests that the brain uses...
research
02/16/2022

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

This paper deals with solving continuous time, state and action optimiza...

Please sign up or login with your details

Forgot password? Click here to reset