Efficient and Accurate Gradients for Neural SDEs

05/27/2021
by   Patrick Kidger, et al.
4

Neural SDEs combine many of the best qualities of both RNNs and SDEs, and as such are a natural choice for modelling many types of temporal dynamics. They offer memory efficiency, high-capacity function approximation, and strong priors on model space. Neural SDEs may be trained as VAEs or as GANs; in either case it is necessary to backpropagate through the SDE solve. In particular this may be done by constructing a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational complexity, numerical errors in the SDE solve, and the cost of reconstructing Brownian motion. Here, we make several technical innovations to overcome these issues. First, we introduce the reversible Heun method: a new SDE solver that is algebraically reversible – which reduces numerical gradient errors to almost zero, improving several test metrics by substantial margins over state-of-the-art. Moreover it requires half as many function evaluations as comparable solvers, giving up to a 1.98× speedup. Next, we introduce the Brownian interval. This is a new and computationally efficient way of exactly sampling and reconstructing Brownian motion; this is in contrast to previous reconstruction techniques that are both approximate and relatively slow. This gives up to a 10.6× speed improvement over previous techniques. After that, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a 1.87× speedup), and removes the truncation errors of the double adjoint required for gradient penalty, substantially improving several test metrics. Altogether these techniques offer substantial improvements over the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2019

Accelerating CNN Training by Sparsifying Activation Gradients

Gradients to activations get involved in most of the calculations during...
research
10/25/2018

Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in...
research
02/19/2019

On the Impact of the Activation Function on Deep Neural Networks Training

The weight initialization and the activation function of deep neural net...
research
09/01/2021

Wasserstein GANs with Gradient Penalty Compute Congested Transport

Wasserstein GANs with Gradient Penalty (WGAN-GP) are an extremely popula...
research
11/15/2019

Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs

Generative adversarial networks (GANs) have shown excellent performance ...
research
02/11/2018

FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

We present Fast-Downsampling MobileNet (FD-MobileNet), an efficient and ...

Please sign up or login with your details

Forgot password? Click here to reset