Stochastic Variance Reduction for Policy Gradient Estimation

10/17/2017
by   Tianbing Xu, et al.
0

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems. However, the variance of the performance gradient estimates obtained from the simulation is often excessive, leading to poor sample efficiency. In this paper, we apply the stochastic variance reduced gradient descent (SVRG) to model-free policy gradient to significantly improve the sample-efficiency. The SVRG estimation is incorporated into a trust-region Newton conjugate gradient framework for the policy optimization. On several Mujoco tasks, our method achieves significantly better performance compared to the state-of-the-art model-free policy gradient methods in robotic continuous control such as trust region policy optimization (TRPO)

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2017

Sample-efficient Policy Optimization with Stein Control Variate

Policy gradient methods have achieved remarkable successes in solving ch...
research
06/01/2017

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previous...
research
11/06/2018

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

We study how the behavior of deep policy gradient algorithms reflects th...
research
07/29/2019

Hindsight Trust Region Policy Optimization

As reinforcement learning continues to drive machine intelligence beyond...
research
12/11/2018

KF-LAX: Kronecker-factored curvature estimation for control variate optimization in reinforcement learning

A key challenge for gradient based optimization methods in model-free re...
research
11/15/2018

Reward-estimation variance elimination in sequential decision processes

Policy gradient methods are very attractive in reinforcement learning du...
research
08/04/2020

Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel

Policy gradient reinforcement learning techniques enable an agent to dir...

Please sign up or login with your details

Forgot password? Click here to reset