Policy Optimization as Wasserstein Gradient Flows

08/09/2018
by   Ruiyi Zhang, et al.
0

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its underlying mathematical principle on policy-distribution optimization is unclear. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes a convex problem in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Empirical results verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.

READ FULL TEXT
research
06/11/2019

Wasserstein Reinforcement Learning

We propose behavior-driven optimization via Wasserstein distances (WDs) ...
research
10/12/2020

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy grad...
research
02/22/2023

From Optimization to Sampling Through Gradient Flows

This article overviews how gradient flows, and discretizations thereof, ...
research
05/17/2019

Stochastically Dominant Distributional Reinforcement Learning

We describe a new approach for mitigating risk in the Reinforcement Lear...
research
05/12/2023

Quantile-Based Deep Reinforcement Learning using Two-Timescale Policy Gradient Algorithms

Classical reinforcement learning (RL) aims to optimize the expected cumu...
research
12/19/2017

On Wasserstein Reinforcement Learning and the Fokker-Planck equation

Policy gradients methods often achieve better performance when the chang...
research
10/24/2020

Gradient Flows in Dataset Space

The current practice in machine learning is traditionally model-centric,...

Please sign up or login with your details

Forgot password? Click here to reset