Nonlinear Distributional Gradient Temporal-Difference Learning

05/20/2018
by   Chao Qu, et al.
0

We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study bellemare2017distributional. In our paper, we design two new algorithms called distributional GTD2 and distributional TDC using the Cramér distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. We prove the asymptotic almost-sure convergence to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexity is linear w.r.t. the number of the parameters of the function approximator, thus can be implemented efficiently for neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

We study the multi-step off-policy learning approach to distributional R...
research
01/30/2019

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Since their introduction a year ago, distributional approaches to reinfo...
research
07/26/2020

Distributional Analysis

In distributional or average-case analysis, the goal is to design an alg...
research
05/13/2018

GAN Q-learning

Distributional reinforcement learning (distributional RL) has seen empir...
research
06/06/2021

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

The distributional reinforcement learning (RL) approach advocates for re...
research
05/29/2019

On the Expected Dynamics of Nonlinear TD Learning

While there are convergence guarantees for temporal difference (TD) lear...
research
04/07/2021

Cooperative motion in one dimension

We prove distributional convergence for a family of random processes on ...

Please sign up or login with your details

Forgot password? Click here to reset