The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

05/28/2023
by   Mark Rowland, et al.
0

We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

Implicit Quantile Networks for Distributional Reinforcement Learning

In this work, we build on recent advances in distributional reinforcemen...
research
01/11/2023

An Analysis of Quantile Temporal-Difference Learning

We analyse quantile temporal-difference learning (QTD), a distributional...
research
11/14/2019

Supplementary material for Uncorrected least-squares temporal difference with lambda-return

Here, we provide a supplementary material for Takayuki Osogami, "Uncorre...
research
07/05/2019

Incrementally Learning Functions of the Return

Temporal difference methods enable efficient estimation of value functio...
research
09/30/2019

Off-policy Multi-step Q-learning

In the past few years, off-policy reinforcement learning methods have sh...
research
11/05/2018

QUOTA: The Quantile Option Architecture for Reinforcement Learning

In this paper, we propose the Quantile Option Architecture (QUOTA) for e...
research
01/25/2018

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods

This paper investigates estimating the variance of a temporal-difference...

Please sign up or login with your details

Forgot password? Click here to reset