Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning

02/06/2022
by   Michael Teng, et al.
0

Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control. We provide further insight into the method via visualization and analysis of the learned distributions and their evolution during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
02/22/2018

An Analysis of Categorical Distributional Reinforcement Learning

Distributional approaches to value-based reinforcement learning model th...
research
12/29/2022

Invariance to Quantile Selection in Distributional Continuous Control

In recent years distributional reinforcement learning has produced many ...
research
04/21/2022

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

Actor-critic algorithms that make use of distributional policy evaluatio...
research
06/09/2021

Bayesian Bellman Operators

We introduce a novel perspective on Bayesian reinforcement learning (RL)...
research
07/15/2021

Statistical modeling of corneal OCT speckle. A distributional model-free approach

In biomedical optics, it is often of interest to statistically model the...
research
10/01/2019

Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping

The distributional perspective on reinforcement learning (RL) has given ...

Please sign up or login with your details

Forgot password? Click here to reset