Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

10/02/2019
by   James A. Preiss, et al.
0

We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Clipped Action Policy Gradient

Many continuous control tasks have bounded action spaces and clip out-of...
research
10/24/2022

On All-Action Policy Gradients

In this paper, we analyze the variance of stochastic policy gradient wit...
research
06/04/2022

Interpolating Between Softmax Policy Gradient and Neural Replicator Dynamics with Capped Implicit Exploration

Neural replicator dynamics (NeuRD) is an alternative to the foundational...
research
03/16/2023

Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets

Understanding and analyzing markets is crucial, yet analytical equilibri...
research
05/09/2018

Policy Optimization with Second-Order Advantage Information

Policy optimization on high-dimensional continuous control tasks exhibit...
research
11/06/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

In reinforcement learning, continuous time is often discretized by a tim...
research
05/28/2019

Conditionally Gaussian Random Sequences for an Integrated Variance Estimator with Correlation between Noise and Returns

Correlation between microstructure noise and latent financial logarithmi...

Please sign up or login with your details

Forgot password? Click here to reset