Distributional Reinforcement Learning with Maximum Mean Discrepancy

by   Thanh Tang Nguyen, et al.

Distributional reinforcement learning (RL) has achieved state-of-the-art performance in Atari games by recasting the traditional RL into a distribution estimation problem, explicitly estimating the probability distribution instead of the expectation of a total return. The bottleneck in distributional RL lies in the estimation of this distribution where one must resort to an approximate representation of the return distributions which are infinite-dimensional. Most existing methods focus on learning a set of predefined statistic functionals of the return distributions requiring involved projections to maintain the order statistics. We take a different perspective using deterministic sampling wherein we approximate the return distributions with a set of deterministic particles that are not attached to any predefined statistic functional, allowing us to freely approximate the return distributions. The learning is then interpreted as evolution of these particles so that a distance between the return distribution and its target distribution is minimized. This learning aim is realized via maximum mean discrepancy (MMD) distance which in turn leads to a simpler loss amenable to backpropagation. Experiments on the suite of Atari 2600 games show that our algorithm outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.


Distributional Reinforcement Learning via Sinkhorn Iterations

Distributional reinforcement learning (RL) is a class of state-of-the-ar...

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Distributional Reinforcement Learning (RL) differs from traditional RL i...

Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm

Distributional reinforcement learning (RL) is a class of state-of-the-ar...

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

A growing trend for value-based reinforcement learning (RL) algorithms i...

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

The distributional reinforcement learning (RL) approach advocates for re...

The Potential of the Return Distribution for Exploration in RL

This paper studies the potential of the return distribution for explorat...

Code Repositories


Code holder for https://arxiv.org/abs/2007.12354

view repo