DiCE: The Infinitely Differentiable Monte-Carlo Estimator

02/14/2018
by   Jakob Foerster, et al.
0

The score function estimator is widely used for estimating gradients of stochastic objectives in Stochastic Computation Graphs (SCG), eg. in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order gradients is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order gradient involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for higher-order gradient estimators. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and through numerical evaluation of the DiCE gradient estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://goo.gl/xkkGxN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2019

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

Gradient-based methods for optimisation of objectives in stochastic sett...
research
04/01/2021

Storchastic: A Framework for General Stochastic Automatic Differentiation

Modelers use automatic differentiation of computation graphs to implemen...
research
02/02/2022

Do Differentiable Simulators Give Better Policy Gradients?

Differentiable simulators promise faster computation time for reinforcem...
research
12/30/2018

A Geometric Theory of Higher-Order Automatic Differentiation

First-order automatic differentiation is a ubiquitous tool across statis...
research
06/24/2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

Model-agnostic meta-reinforcement learning requires estimating the Hessi...
research
05/31/2021

A unified view of likelihood ratio and reparameterization gradients

Reparameterization (RP) and likelihood ratio (LR) gradient estimators ar...
research
05/22/2018

Implicit Reparameterization Gradients

By providing a simple and efficient way of computing low-variance gradie...

Please sign up or login with your details

Forgot password? Click here to reset