The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

07/15/2022
by   Yunhao Tang, et al.
0

We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The distinction from the value-based case bears important implications on concepts such as backward-view algorithms. Our work provides the first theoretical guarantees on multi-step off-policy distributional RL algorithms, including results that apply to the small number of existing approaches to multi-step distributional RL. In addition, we derive a novel algorithm, Quantile Regression-Retrace, which leads to a deep RL agent QR-DQN-Retrace that shows empirical improvements over QR-DQN on the Atari-57 benchmark. Collectively, we shed light on how unique challenges in multi-step distributional RL can be addressed both in theory and practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2019

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Since their introduction a year ago, distributional approaches to reinfo...
research
07/28/2020

Munchausen Reinforcement Learning

Bootstrapping is a core mechanism in Reinforcement Learning (RL). Most a...
research
05/20/2018

Nonlinear Distributional Gradient Temporal-Difference Learning

We devise a distributional variant of gradient temporal-difference (TD) ...
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
02/03/2023

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-pe...
research
02/22/2021

Distributional data analysis via quantile functions and its application to modelling digital biomarkers of gait in Alzheimer's Disease

With the advent of continuous health monitoring via wearable devices, us...
research
05/25/2023

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

While distributional reinforcement learning (RL) has demonstrated empiri...

Please sign up or login with your details

Forgot password? Click here to reset