Credit Assignment Techniques in Stochastic Computation Graphs

01/07/2019
by   Theophane Weber, et al.
0

Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2020

Hindsight Network Credit Assignment

We present Hindsight Network Credit Assignment (HNCA), a novel learning ...
research
06/29/2023

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

To make reinforcement learning more sample efficient, we need better cre...
research
11/19/2019

Variance Reduced Advantage Estimation with δ Hindsight Credit Assignment

Hindsight Credit Assignment (HCA) refers to a recently proposed family o...
research
12/11/2018

KF-LAX: Kronecker-factored curvature estimation for control variate optimization in reinforcement learning

A key challenge for gradient based optimization methods in model-free re...
research
06/13/2023

Differentiating Metropolis-Hastings to Optimize Intractable Densities

We develop an algorithm for automatic differentiation of Metropolis-Hast...
research
06/30/2022

On the Learning and Learnablity of Quasimetrics

Our world is full of asymmetries. Gravity and wind can make reaching a p...
research
04/21/2023

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

We propose an evolution strategies-based algorithm for estimating gradie...

Please sign up or login with your details

Forgot password? Click here to reset