Unbiasing Truncated Backpropagation Through Time

05/23/2017
by   Corentin Tallec, et al.
0

Truncated Backpropagation Through Time (truncated BPTT) is a widespread method for learning recurrent computational graphs. Truncated BPTT keeps the computational benefits of Backpropagation Through Time (BPTT) while relieving the need for a complete backtrack through the whole data sequence at every step. However, truncation favors short-term dependencies: the gradient estimate of truncated BPTT is biased, so that it does not benefit from the convergence guarantees from stochastic gradient theory. We introduce Anticipated Reweighted Truncated Backpropagation (ARTBP), an algorithm that keeps the computational benefits of truncated BPTT, while providing unbiasedness. ARTBP works by using variable truncation lengths together with carefully chosen compensation factors in the backpropagation equation. We check the viability of ARTBP on two tasks. First, a simple synthetic task where careful balancing of temporal dependencies at different scales is needed: truncated BPTT displays unreliable performance, and in worst case scenarios, divergence, while ARTBP converges reliably. Second, on Penn Treebank character-level language modelling, ARTBP slightly outperforms truncated BPTT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2017

Unbiased Online Recurrent Optimization

The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows...
research
03/26/2021

Backpropagation Through Time For Networks With Long-Term Dependencies

Backpropagation through time (BPTT) is a technique of updating tuned par...
research
05/17/2019

Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

Truncated backpropagation through time (TBPTT) is a popular method for l...
research
11/07/2017

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

A major drawback of backpropagation through time (BPTT) is the difficult...
research
03/25/2023

Exact Short Products From Truncated Multipliers

We sometimes need to compute the most significant digits of the product ...
research
09/30/2022

Graphs with the same truncated cycle matroid

The classical Whitney's 2-Isomorphism Theorem describes the families of ...
research
07/20/2021

Asynchronous Truncated Multigrid-reduction-in-time (AT-MGRIT)

In this paper, we present the new "asynchronous truncated multigrid-redu...

Please sign up or login with your details

Forgot password? Click here to reset