Understanding Synthetic Gradients and Decoupled Neural Interfaces

When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better understand their behaviour and elucidate their effect on optimisation. We show that the incorporation of SGs does not affect the representational strength of the learning system for a neural network, and prove the convergence of the learning system for linear and deep linear models. On practical problems we investigate the mechanism by which synthetic gradient estimators approximate the true loss, and, surprisingly, how that leads to drastically different layer-wise representations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

READ FULL TEXT

page 5

page 13

page 15

research
08/18/2016

Decoupled Neural Interfaces using Synthetic Gradients

Training directed neural networks typically requires forward-propagating...
research
06/21/2019

Fully Decoupled Neural Network Learning Using Delayed Gradients

Using the back-propagation (BP) to train neural networks requires a sequ...
research
12/22/2017

Benchmarking Decoupled Neural Interfaces with Synthetic Gradients

Artifical Neural Network are a particular class of learning system model...
research
09/21/2020

Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules

Training deep learning models on embedded devices is typically avoided s...
research
02/14/2022

Orthogonalising gradients to speed up neural network optimisation

The optimisation of neural networks can be sped up by orthogonalising th...
research
10/17/2017

Spontaneous Symmetry Breaking in Neural Networks

We propose a framework to understand the unprecedented performance and r...
research
02/19/2002

On model selection and the disability of neural networks to decompose tasks

A neural network with fixed topology can be regarded as a parametrizatio...

Please sign up or login with your details

Forgot password? Click here to reset