A Cramér Distance perspective on Non-crossing Quantile Regression in Distributional Reinforcement Learning

10/01/2021
by   Alix Lheritier, et al.
0

Distributional reinforcement learning (DRL) extends the value-based approach by using a deep convolutional network to approximate the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile-based methods like QR-DQN project arbitrary distributions onto a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance, however, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Recently, monotonicity constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramér distance yields a projection that coincides with the 1-Wasserstein one and that, under monotonicity constraints, the squared Cramér and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a novel non-crossing neural architecture that allows a good training performance using a novel algorithm to compute the Cramér distance, yielding significant improvements over QR-DQN in a number of games of the standard Atari 2600 benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2023

Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

Successful applications of distributional reinforcement learning with qu...
research
06/14/2018

Implicit Quantile Networks for Distributional Reinforcement Learning

In this work, we build on recent advances in distributional reinforcemen...
research
06/12/2018

Improving Regression Performance with Distributional Losses

There is growing evidence that converting targets to soft targets in sup...
research
02/09/2021

Regularization Strategies for Quantile Regression

We investigate different methods for regularizing quantile regression wh...
research
06/13/2022

IGN : Implicit Generative Networks

In this work, we build recent advances in distributional reinforcement l...
research
11/05/2018

QUOTA: The Quantile Option Architecture for Reinforcement Learning

In this paper, we propose the Quantile Option Architecture (QUOTA) for e...
research
11/08/2021

Solution to the Non-Monotonicity and Crossing Problems in Quantile Regression

This paper proposes a new method to address the long-standing problem of...

Please sign up or login with your details

Forgot password? Click here to reset