Distributional Reinforcement Learning via Sinkhorn Iterations

02/01/2022
by   Ke Sun, et al.
1

Distributional reinforcement learning (RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. The representation manner of each return distribution and the choice of distribution divergence are pivotal for the empirical success of distributional RL. In this paper, we propose a new class of Sinkhorn distributional RL algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then leverages Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Remarkably, as Sinkhorn divergence interpolates between the Wasserstein distance and Maximum Mean Discrepancy (MMD). This allows our proposed Sinkhorn distributional RL algorithms to find a sweet spot leveraging the geometry of optimal transport-based distance, and the unbiased gradient estimates of MMD. Finally, experiments on a suite of Atari games reveal the competitive performance of Sinkhorn distributional RL algorithm as opposed to existing state-of-the-art algorithms.

READ FULL TEXT

page 6

page 8

research
10/07/2021

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
07/24/2020

Distributional Reinforcement Learning with Maximum Mean Discrepancy

Distributional reinforcement learning (RL) has achieved state-of-the-art...
research
06/06/2021

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

The distributional reinforcement learning (RL) approach advocates for re...
research
02/21/2019

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...
research
03/20/2021

Bayesian Distributional Policy Gradients

Distributional Reinforcement Learning (RL) maintains the entire probabil...
research
09/17/2021

Exploring the Robustness of Distributional Reinforcement Learning against Noisy State Observations

In real scenarios, state observations that an agent observes may contain...
research
12/28/2021

Robustness and risk management via distributional dynamic programming

In dynamic programming (DP) and reinforcement learning (RL), an agent le...

Please sign up or login with your details

Forgot password? Click here to reset