Bayesian Distributional Policy Gradients

03/20/2021
by   Luchen Li, et al.
0

Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain a new curiosity measure that helps BDPG steer exploration actively and efficiently. We demonstrate in a suite of Atari 2600 games and MuJoCo tasks, including well known hard-exploration challenges, how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms.

READ FULL TEXT
research
08/03/2023

Bag of Policies for Distributional Deep Exploration

Efficient exploration in complex environments remains a major challenge ...
research
02/01/2022

Distributional Reinforcement Learning via Sinkhorn Iterations

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
09/29/2022

How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

We consider the problem of learning a set of probability distributions f...
research
01/31/2022

On solutions of the distributional Bellman equation

In distributional reinforcement learning not only expected returns but t...
research
06/11/2018

The Potential of the Return Distribution for Exploration in RL

This paper studies the potential of the return distribution for explorat...
research
07/10/2019

Striving for Simplicity in Off-policy Deep Reinforcement Learning

Reflecting on the advances of off-policy deep reinforcement learning (RL...
research
06/12/2023

Diverse Projection Ensembles for Distributional Reinforcement Learning

In contrast to classical reinforcement learning, distributional reinforc...

Please sign up or login with your details

Forgot password? Click here to reset