Wasserstein Reinforcement Learning

06/11/2019
by   Aldo Pacchiano, et al.
1

We propose behavior-driven optimization via Wasserstein distances (WDs) to improve several classes of state-of-the-art reinforcement learning (RL) algorithms. We show that WD regularizers acting on appropriate policy embeddings efficiently incorporate behavioral characteristics into policy optimization. We demonstrate that they improve Evolution Strategy methods by encouraging more efficient exploration, can be applied in imitation learning and to speed up training of Trust Region Policy Optimization methods. Since the exact computation of WDs is expensive, we develop approximate algorithms based on the combination of different methods: dual formulation of the optimal transport problem, alternating optimization and random feature maps, to effectively replace exact WD computations in the RL tasks considered. We provide theoretical analysis of our algorithms and exhaustive empirical evaluation in a variety of RL settings.

READ FULL TEXT
research
10/12/2020

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy grad...
research
08/09/2018

Policy Optimization as Wasserstein Gradient Flows

Policy optimization is a core component of reinforcement learning (RL), ...
research
01/20/2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Reinforcement learning (RL) has been widely studied for improving sequen...
research
10/20/2022

Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Policy Optimization (PO) algorithms have been proven particularly suited...
research
06/17/2023

Active Policy Improvement from Multiple Black-box Oracles

Reinforcement learning (RL) has made significant strides in various comp...
research
03/01/2018

Hierarchical Imitation and Reinforcement Learning

We study the problem of learning policies over long time horizons. We pr...
research
05/09/2023

Assessment of Reinforcement Learning Algorithms for Nuclear Power Plant Fuel Optimization

The nuclear fuel loading pattern optimization problem has been studied s...

Please sign up or login with your details

Forgot password? Click here to reset