Analysis of Temporal Difference Learning: Linear System Approach

04/22/2022
by   Donghwan Lee, et al.
0

The goal of this technical note is to introduce a new finite-time convergence analysis of temporal difference (TD) learning based on stochastic linear system models. TD-learning is a fundamental reinforcement learning (RL) to evaluate a given policy by estimating the corresponding value function for a Markov decision process. While there has been a series of successful works in theoretical analysis of TDlearning, it was not until recently that researchers found some guarantees on its statistical efficiency by developing finite-time error bounds. In this paper, we propose a simple control theoretic finite-time analysis of TD-learning, which exploits linear system models and standard notions in linear system communities. The proposed work provides new simple templets for RL analysis, and additional insights on TD-learning and RL based on ideas in control theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
06/06/2018

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Temporal difference learning (TD) is a simple iterative algorithm used t...
research
06/29/2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

We propose a novel value approximation method, namely Eigensubspace Regu...
research
02/14/2022

Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

Value-based methods play a fundamental role in Markov decision processes...
research
12/22/2017

Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Reinforcement learning (RL) has been successfully used to solve many con...
research
03/06/2021

Causal Reinforcement Learning: An Instrumental Variable Approach

In the standard data analysis framework, data is first collected (once f...
research
02/19/2019

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...

Please sign up or login with your details

Forgot password? Click here to reset