Distributional Reinforcement Learning with Maximum Mean Discrepancy

07/24/2020
by   Thanh Tang Nguyen, et al.
0

Distributional reinforcement learning (RL) has achieved state-of-the-art performance in Atari games by recasting the traditional RL into a distribution estimation problem, explicitly estimating the probability distribution instead of the expectation of a total return. The bottleneck in distributional RL lies in the estimation of this distribution where one must resort to an approximate representation of the return distributions which are infinite-dimensional. Most existing methods focus on learning a set of predefined statistic functionals of the return distributions requiring involved projections to maintain the order statistics. We take a different perspective using deterministic sampling wherein we approximate the return distributions with a set of deterministic particles that are not attached to any predefined statistic functional, allowing us to freely approximate the return distributions. The learning is then interpreted as evolution of these particles so that a distance between the return distribution and its target distribution is minimized. This learning aim is realized via maximum mean discrepancy (MMD) distance which in turn leads to a simpler loss amenable to backpropagation. Experiments on the suite of Atari 2600 games show that our algorithm outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

READ FULL TEXT
research
02/01/2022

Distributional Reinforcement Learning via Sinkhorn Iterations

Distributional reinforcement learning (RL) is a class of state-of-the-ar...
research
11/05/2019

Fully Parameterized Quantile Function for Distributional Reinforcement Learning

Distributional Reinforcement Learning (RL) differs from traditional RL i...
research
02/21/2019

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distribution...
research
01/09/2020

Addressing Value Estimation Errors in Reinforcement Learning with a State-Action Return Distribution Function

In current reinforcement learning (RL) methods, function approximation e...
research
06/11/2018

The Potential of the Return Distribution for Exploration in RL

This paper studies the potential of the return distribution for explorat...
research
10/26/2021

Distributional Reinforcement Learning for Multi-Dimensional Reward Functions

A growing trend for value-based reinforcement learning (RL) algorithms i...
research
06/06/2021

Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks

The distributional reinforcement learning (RL) approach advocates for re...

Please sign up or login with your details

Forgot password? Click here to reset