All-Action Policy Gradient Methods: A Numerical Integration Approach

10/21/2019
by   Benjamin Petit, et al.
0

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this integral can be computed, the resulting "all-action" estimator [Sutton, 2001] provides a conditioning effect [Bratley, 1987] reducing the variance significantly compared to the REINFORCE estimator [Williams, 1992]. In this paper, we adopt a numerical integration perspective to broaden the applicability of the all-action estimator to general spaces and to any function class for the policy or critic components, beyond the Gaussian case considered by [Ciosek, 2018]. In addition, we provide a new theoretical result on the effect of using a biased critic which offers more guidance than the previous "compatible features" condition of [Sutton, 1999]. We demonstrate the benefit of our approach in continuous control tasks with nonlinear function approximation. Our results show improved performance and sample efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Clipped Action Policy Gradient

Many continuous control tasks have bounded action spaces and clip out-of...
research
06/20/2017

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

We show how an action-dependent baseline can be used by the policy gradi...
research
07/20/2021

An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful ...
research
03/08/2022

An Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful ...
research
05/09/2023

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Reinforcement learning on high-dimensional and complex problems relies o...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...
research
05/09/2018

Policy Optimization with Second-Order Advantage Information

Policy optimization on high-dimensional continuous control tasks exhibit...

Please sign up or login with your details

Forgot password? Click here to reset