Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

05/17/2019
by   Christopher Aicher, et al.
0

Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too large. We propose an adaptive TBPTT scheme that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias. For many realistic RNNs, the TBPTT gradients decay geometrically for large lags; under this condition, we can control the bias by varying the truncation length adaptively. For RNNs with smooth activation functions, we prove that this bias controls the convergence rate of SGD with biased gradients for our non-convex loss. Using this theory, we develop a practical method for adaptively estimating the truncation length during training. We evaluate our adaptive TBPTT method on synthetic data and language modeling tasks and find that our adaptive TBPTT ameliorates the computational pitfalls of fixed TBPTT.

READ FULL TEXT
research
02/27/2019

Alternating Synthetic and Real Gradients for Neural Language Modeling

Training recurrent neural networks (RNNs) with backpropagation through t...
research
05/23/2017

Unbiasing Truncated Backpropagation Through Time

Truncated Backpropagation Through Time (truncated BPTT) is a widespread ...
research
02/04/2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Recurrent Neural Networks (RNNs) are among the most popular models in se...
research
03/26/2021

Backpropagation Through Time For Networks With Long-Term Dependencies

Backpropagation through time (BPTT) is a technique of updating tuned par...
research
06/10/2016

Memory-Efficient Backpropagation Through Time

We propose a novel approach to reduce memory consumption of the backprop...
research
12/08/2016

Learning in the Machine: Random Backpropagation and the Learning Channel

Random backpropagation (RBP) is a variant of the backpropagation algorit...
research
08/09/2022

Training Overparametrized Neural Networks in Sublinear Time

The success of deep learning comes at a tremendous computational and ene...

Please sign up or login with your details

Forgot password? Click here to reset