Partial advantage estimator for proximal policy optimization

01/26/2023
by   Xiulei Song, et al.
0

Estimation of value in policy gradient methods is a fundamental problem. Generalized Advantage Estimation (GAE) is an exponentially-weighted estimator of an advantage function similar to λ-return. It substantially reduces the variance of policy gradient estimates at the expense of bias. In practical applications, a truncated GAE is used due to the incompleteness of the trajectory, which results in a large bias during estimation. To address this challenge, instead of using the entire truncated GAE, we propose to take a part of it when calculating updates, which significantly reduces the bias resulting from the incomplete trajectory. We perform experiments in MuJoCo and μRTS to investigate the effect of different partial coefficient and sampling lengths. We show that our partial GAE approach yields better empirical results in both environments.

READ FULL TEXT
research
10/20/2020

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient)...
research
10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
research
01/31/2022

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Policy gradient (PG) estimation becomes a challenge when we are not allo...
research
05/06/2022

Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization

We extend the idea underlying the success of green simulation assisted p...
research
03/01/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combin...
research
07/01/2020

Regularly Updated Deterministic Policy Gradient Algorithm

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most w...
research
09/26/2022

Learning GFlowNets from partial episodes for improved convergence and stability

Generative flow networks (GFlowNets) are a family of algorithms for trai...

Please sign up or login with your details

Forgot password? Click here to reset