Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

10/20/2022
by   Antonio Terpin, et al.
0

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method. In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces. We circumvent the infinite-dimensional optimization problem for PO by providing a one-dimensional dual reformulation for which strong duality holds. We then analytically derive the optimal policy update given the solution of the dual problem. This way, we bypass the computation of optimal transport costs and of optimal transport maps, which we implicitly characterize by solving the dual formulation. Finally, we provide an experimental evaluation of our approach across various control tasks. Our results show that optimal transport discrepancies can offer an advantage over state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

Distributionally-Constrained Policy Optimization via Unbalanced Optimal Transport

We consider constrained policy optimization in Reinforcement Learning, w...
research
06/25/2023

Provably Convergent Policy Optimization via Metric-aware Trust Region Methods

Trust-region methods based on Kullback-Leibler divergence are pervasivel...
research
06/02/2021

Partial Wasserstein Covering

We consider a general task called partial Wasserstein covering with the ...
research
03/20/2022

Distributionally robust risk evaluation with causality constraint and structural information

This work studies distributionally robust evaluation of expected functio...
research
06/12/2020

Handling Multiple Costs in Optimal Transport: Strong Duality and Efficient Computation

We introduce an extension of the optimal transportation (OT) problem whe...
research
06/11/2019

Wasserstein Reinforcement Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) ...
research
08/10/2023

Unifying Distributionally Robust Optimization via Optimal Transport Theory

In the past few years, there has been considerable interest in two promi...

Please sign up or login with your details

Forgot password? Click here to reset