A note on reinforcement learning with Wasserstein distance regularisation, with applications to multipolicy learning

02/12/2018
by   Mohammed Amin Abdullah, et al.
0

In this note we describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2018

Minimax Distribution Estimation in Wasserstein Distance

The Wasserstein metric is an important measure of distance between proba...
research
06/04/2020

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO...
research
12/19/2017

On Wasserstein Reinforcement Learning and the Fokker-Planck equation

Policy gradients methods often achieve better performance when the chang...
research
10/01/2019

Wasserstein Neural Processes

Neural Processes (NPs) are a class of models that learn a mapping from a...
research
12/07/2020

The Spectral-Domain 𝒲_2 Wasserstein Distance for Elliptical Processes and the Spectral-Domain Gelbrich Bound

In this short note, we introduce the spectral-domain 𝒲_2 Wasserstein dis...
research
05/16/2022

Wasserstein t-SNE

Scientific datasets often have hierarchical structure: for example, in s...
research
11/28/2022

Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand

We derive a learning framework to generate routing/pickup policies for a...

Please sign up or login with your details

Forgot password? Click here to reset