Do Differentiable Simulators Give Better Policy Gradients?

02/02/2022
by   H. J. Terry Suh, et al.
0

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an α-order gradient estimator, with α∈ [0,1], which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zero-order methods. We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2022

An Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful ...
research
07/20/2021

An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients

Reinforcement learning methods for robotics are increasingly successful ...
research
12/12/2020

Faster Policy Learning with Continuous-Time Gradients

We study the estimation of policy gradients for continuous-time systems ...
research
02/14/2018

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

The score function estimator is widely used for estimating gradients of ...
research
08/02/2022

A Note on Zeroth-Order Optimization on the Simplex

We construct a zeroth-order gradient estimator for a smooth function def...
research
05/14/2013

Estimating or Propagating Gradients Through Stochastic Neurons

Stochastic neurons can be useful for a number of reasons in deep learnin...
research
08/13/2023

Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training

Binarization of neural networks is a dominant paradigm in neural network...

Please sign up or login with your details

Forgot password? Click here to reset