An Analysis of Measure-Valued Derivatives for Policy Gradients

03/08/2022
by   João Carvalho, et al.
0

Reinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator - the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces. With this work, we want to show that the Measure-Valued Derivative estimator can be a useful alternative to other policy gradient estimators.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2021

An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful ...
research
02/05/2019

Total stochastic gradient algorithms and applications in reinforcement learning

Backpropagation and the chain rule of derivatives have been prominent; h...
research
02/02/2022

Do Differentiable Simulators Give Better Policy Gradients?

Differentiable simulators promise faster computation time for reinforcem...
research
04/09/2020

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcem...
research
10/21/2019

All-Action Policy Gradient Methods: A Numerical Integration Approach

While often stated as an instance of the likelihood ratio trick [Rubinst...
research
03/16/2023

Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets

Understanding and analyzing markets is crucial, yet analytical equilibri...
research
02/27/2018

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Policy gradient methods are a widely used class of model-free reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset