Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

03/13/2019
by   Yunhao Tang, et al.
5

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradient estimator for discrete random variable models. We show that the ARM policy gradient estimator achieves variance reduction with theoretical guarantees, and leads to significantly more stable and faster convergence of policies parameterized by neural networks.

READ FULL TEXT

page 13

page 15

page 28

research
02/21/2018

Clipped Action Policy Gradient

Many continuous control tasks have bounded action spaces and clip out-of...
research
05/04/2019

ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables

To address the challenge of backpropagating the gradient through categor...
research
07/30/2018

ARM: Augment-REINFORCE-Merge Gradient for Discrete Latent Variable Models

To backpropagate the gradients through discrete stochastic layers, we en...
research
06/18/2020

DisARM: An Antithetic Gradient Estimator for Binary Latent Variables

Training models with discrete latent variables is challenging due to the...
research
04/09/2019

L_0-ARM: Network Sparsification via Stochastic Binary Optimization

We consider network sparsification as an L_0-norm regularized binary opt...
research
02/12/2018

Policy Gradients for Contextual Bandits

We study a generalized contextual-bandits problem, where there is a stat...
research
02/20/2021

Causal Policy Gradients

Policy gradient methods can solve complex tasks but often fail when the ...

Please sign up or login with your details

Forgot password? Click here to reset