Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods

08/08/2019
by   Ching-An Cheng, et al.
0

Policy gradient methods have demonstrated success in reinforcement learning tasks that have high-dimensional continuous state and action spaces. However, policy gradient methods are also notoriously sample inefficient. This can be attributed, at least in part, to the high variance in estimating the gradient of the task objective with Monte Carlo methods. Previous research has endeavored to contend with this problem by studying control variates (CVs) that can reduce the variance of estimates without introducing bias, including the early use of baselines, state dependent CVs, and the more recent state-action dependent CVs. In this work, we analyze the properties and drawbacks of previous CV techniques and, surprisingly, we find that these works have overlooked an important fact that Monte Carlo gradient estimates are generated by trajectories of states and actions. We show that ignoring the correlation across the trajectories can result in suboptimal variance reduction, and we propose a simple fix: a class of "trajectory-wise" CVs, that can further drive down the variance. We show that constructing trajectory-wise CVs can be done recursively and requires only learning state-action value functions like the previous CVs for policy gradient. We further prove that the proposed trajectory-wise CVs are optimal for variance reduction under reasonable assumptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2018

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

Policy gradient methods have enjoyed great success in deep reinforcement...
research
07/24/2023

Policy Gradient Optimal Correlation Search for Variance Reduction in Monte Carlo simulation and Maximum Optimal Transport

We propose a new algorithm for variance reduction when estimating f(X_T)...
research
07/11/2021

Coordinate-wise Control Variates for Deep Policy Gradients

The control variates (CV) method is widely used in policy gradient estim...
research
05/29/2019

Variance Reduction for Evolution Strategies via Structured Control Variates

Evolution Strategies (ES) are a powerful class of blackbox optimization ...
research
12/31/2019

Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation

Sequence generation models are commonly refined with reinforcement learn...
research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
05/14/2019

Trajectory-Based Off-Policy Deep Reinforcement Learning

Policy gradient methods are powerful reinforcement learning algorithms a...

Please sign up or login with your details

Forgot password? Click here to reset