On the Statistical Benefits of Temporal Difference Learning

01/30/2023
by   David Cheikhi, et al.
0

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive time-steps. Focusing on finite state Markov chains, we provide a crisp asymptotic theory of the statistical advantages of this approach. First, we show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates. Depending on problem structure, the reduction could be enormous or nonexistent. Next, we prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states: TD's errors are bounded in terms of a novel measure - the problem's trajectory crossing time - which can be much smaller than the problem's time horizon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2021

Preferential Temporal Difference Learning

Temporal-Difference (TD) learning is a general and very useful tool for ...
research
04/20/2022

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

In this paper, we consider the policy evaluation problem in multi-agent ...
research
08/15/2020

Reducing Sampling Error in Batch Temporal Difference Learning

Temporal difference (TD) learning is one of the main foundations of mode...
research
10/27/2020

γ-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction

We introduce the γ-model, a predictive model of environment dynamics wit...
research
12/30/2016

Adaptive Lambda Least-Squares Temporal Difference Learning

Temporal Difference learning or TD(λ) is a fundamental algorithm in the ...
research
04/29/2021

Uncertainty Principles in Risk-Aware Statistical Estimation

We present a new uncertainty principle for risk-aware statistical estima...
research
11/07/2022

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification

We study non-parametric estimation of the value function of an infinite-...

Please sign up or login with your details

Forgot password? Click here to reset