Clipped Action Policy Gradient

02/21/2018
by   Yasuhiro Fujita, et al.
0

Many continuous control tasks have bounded action spaces and clip out-of-bound actions before execution. Policy gradient methods often optimize policies as if actions were not clipped. We propose clipped action policy gradient (CAPG) as an alternative policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that CAPG is unbiased and achieves lower variance than the original estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the original estimator, indicating its promise as a better policy gradient estimator for continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2019

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

We study the variance of the REINFORCE policy gradient estimator in envi...
research
03/13/2019

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

Due to the high variance of policy gradients, on-policy optimization alg...
research
06/13/2018

Marginal Policy Gradients for Complex Control

Many complex domains, such as robotics control and real-time strategy (R...
research
10/21/2019

All-Action Policy Gradient Methods: A Numerical Integration Approach

While often stated as an instance of the likelihood ratio trick [Rubinst...
research
05/09/2018

Policy Optimization with Second-Order Advantage Information

Policy optimization on high-dimensional continuous control tasks exhibit...
research
06/30/2020

Policy Gradient Optimization of Thompson Sampling Policies

We study the use of policy gradient algorithms to optimize over a class ...
research
12/22/2021

An Alternate Policy Gradient Estimator for Softmax Policies

Policy gradient (PG) estimators for softmax policies are ineffective wit...

Please sign up or login with your details

Forgot password? Click here to reset