Coordinate-wise Control Variates for Deep Policy Gradients

07/11/2021
by   Yuanyi Zhong, et al.
3

The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/08/2019

Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

Policy gradient methods have demonstrated success in reinforcement learn...
research
01/03/2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

The high variance issue in unbiased policy-gradient methods such as VPG ...
research
01/16/2023

The Role of Baselines in Policy Gradient Optimization

We study the effect of baselines in on-policy stochastic policy gradient...
research
06/13/2018

Marginal Policy Gradients for Complex Control

Many complex domains, such as robotics control and real-time strategy (R...
research
02/20/2023

Improving Deep Policy Gradients with Value Function Search

Deep Policy Gradient (PG) algorithms employ value networks to drive the ...
research
11/26/2018

A Policy Gradient Method with Variance Reduction for Uplift Modeling

Uplift modeling aims to directly model the incremental impact of a treat...
research
07/06/2018

Variance Reduction for Reinforcement Learning in Input-Driven Environments

We consider reinforcement learning in input-driven environments, where a...

Please sign up or login with your details

Forgot password? Click here to reset