An Analysis of Quantile Temporal-Difference Learning

01/11/2023
by   Mark Rowland, et al.
0

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

We study the problem of temporal-difference-based policy evaluation in r...
research
04/07/2021

Finite-Sample Analysis for Two Time-scale Non-linear TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two tim...
research
12/29/2021

Control Theoretic Analysis of Temporal Difference Learning

The goal of this paper is to investigate a control theoretic analysis of...
research
02/28/2019

A numerical scheme for the quantile hedging problem

We consider the numerical approximation of the quantile hedging price in...
research
03/02/2020

Risk-Averse Learning by Temporal Difference Methods

We consider reinforcement learning with performance evaluated by a dynam...
research
02/20/2020

Adaptive Temporal Difference Learning with Linear Function Approximation

This paper revisits the celebrated temporal difference (TD) learning alg...
research
06/30/2023

TD Convergence: An Optimization Perspective

We study the convergence behavior of the celebrated temporal-difference ...

Please sign up or login with your details

Forgot password? Click here to reset