A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

01/03/2017
by   Nithyanand Kota, et al.
0

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2017

Sample-efficient Policy Optimization with Stein Control Variate

Policy gradient methods have achieved remarkable successes in solving ch...
research
01/17/2013

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

The policy gradient approach is a flexible and powerful reinforcement le...
research
01/10/2013

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

There exist a number of reinforcement learning algorithms which learnby ...
research
07/11/2021

Coordinate-wise Control Variates for Deep Policy Gradients

The control variates (CV) method is widely used in policy gradient estim...
research
02/04/2022

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...
research
06/28/2020

Deep Bayesian Quadrature Policy Optimization

We study the problem of obtaining accurate policy gradient estimates. Th...
research
06/14/2022

How are policy gradient methods affected by the limits of control?

We study stochastic policy gradient methods from the perspective of cont...

Please sign up or login with your details

Forgot password? Click here to reset