Randomized Automatic Differentiation

by   Deniz Oktay, et al.

The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which allows unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor.


page 1

page 2

page 3

page 4


Storchastic: A Framework for General Stochastic Automatic Differentiation

Modelers use automatic differentiation of computation graphs to implemen...

Gradients without Backpropagation

Using backpropagation to compute gradients of objective functions for op...

SVGD: A Virtual Gradients Descent Method for Stochastic Optimization

Inspired by dynamic programming, we propose Stochastic Virtual Gradient ...

Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator

Deep learning has seen tremendous success over the past decade in comput...

ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic Programs

Optimizing the expected values of probabilistic processes is a central p...

Benchmarking Python Tools for Automatic Differentiation

In this paper we compare several Python tools for automatic differentiat...

Automatic Differentiation of Programs with Discrete Randomness

Automatic differentiation (AD), a technique for constructing new program...

Code Repositories


Experiment code for "Randomized Automatic Differentiation"

view repo



view repo

Please sign up or login with your details

Forgot password? Click here to reset