Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

10/26/2021
by   Pushi Zhang, et al.
0

A growing trend for value-based reinforcement learning (RL) algorithms is to capture more information than scalar value functions in the value network. One of the most well-known methods in this branch is distributional RL, which models return distribution instead of scalar value. In another line of work, hybrid reward architectures (HRA) in RL have studied to model source-specific value functions for each source of reward, which is also shown to be beneficial in performance. To fully inherit the benefits of distributional RL and hybrid reward architectures, we introduce Multi-Dimensional Distributional DQN (MD3QN), which extends distributional RL to model the joint return distribution from multiple reward sources. As a by-product of joint distribution modeling, MD3QN can capture not only the randomness in returns for each source of reward, but also the rich reward correlation between the randomness of different sources. We prove the convergence for the joint distributional Bellman operator and build our empirical algorithm by minimizing the Maximum Mean Discrepancy between joint return distribution and its Bellman target. In experiments, our method accurately models the joint return distribution in environments with richly correlated reward functions, and outperforms previous RL methods utilizing multi-dimensional reward functions in the control setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?

We consider the problem of learning a set of probability distributions f...
research
11/06/2019

Distributional Reward Decomposition for Reinforcement Learning

Many reinforcement learning (RL) tasks have specific properties that can...
research
04/27/2023

One-Step Distributional Reinforcement Learning

Reinforcement learning (RL) allows an agent interacting sequentially wit...
research
08/06/2018

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN

The recently proposed distributional approach to reinforcement learning ...
research
07/24/2020

Distributional Reinforcement Learning with Maximum Mean Discrepancy

Distributional reinforcement learning (RL) has achieved state-of-the-art...
research
02/19/2023

Distributional Offline Policy Evaluation with Predictive Error Guarantees

We study the problem of estimating the distribution of the return of a p...
research
05/25/2023

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

While distributional reinforcement learning (RL) has demonstrated empiri...

Please sign up or login with your details

Forgot password? Click here to reset