Depth-Adaptive Transformer

10/22/2019
by   Maha Elbayad, et al.
0

State of the art sequence-to-sequence models perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make output predictions at different stages of the network and we investigate different ways to predict how much computation is required for a particular sequence. Unlike dynamic computation in Universal Transformers, which applies the same set of layers iteratively, we apply different layers at every step to adjust both the amount of computation as well as the model capacity. Experiments on machine translation benchmarks show that this approach can match the accuracy of a baseline Transformer while using only half the number of decoder layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2019

Are Transformers universal approximators of sequence-to-sequence functions?

Despite the widespread adoption of Transformer models for NLP tasks, the...
research
02/17/2020

Controlling Computation versus Quality for Neural Sequence Models

Most neural networks utilize the same amount of compute for every exampl...
research
10/19/2022

Transformers Learn Shortcuts to Automata

Algorithmic reasoning requires capabilities which are most naturally und...
research
04/27/2020

Explicitly Modeling Adaptive Depths for Transformer

The vanilla Transformer conducts a fixed number of computations over all...
research
11/27/2019

DeFINE: DEep Factorized INput Word Embeddings for Neural Sequence Modeling

For sequence models with large word-level vocabularies, a majority of ne...
research
10/22/2020

N-ODE Transformer: A Depth-Adaptive Variant of the Transformer Using Neural Ordinary Differential Equations

We use neural ordinary differential equations to formulate a variant of ...
research
12/06/2018

Layer Flexible Adaptive Computational Time for Recurrent Neural Networks

Deep recurrent neural networks show significant benefits in prediction t...

Please sign up or login with your details

Forgot password? Click here to reset