Improving Transformer Models by Reordering their Sublayers

11/10/2019
by   Ofir Press, et al.
0

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern achieve better performance? We generate randomly ordered transformers and train them with the language modeling objective. We observe that some of these models are able to achieve better performance than the interleaved baseline, and that those successful variants tend to have more self-attention at the bottom and more feedforward sublayers at the top. We propose a new transformer design pattern that adheres to this property, the sandwich transformer, and show that it improves perplexity on the WikiText-103 language modeling benchmark, at no cost in parameters, memory, or training time.

READ FULL TEXT

page 4

page 8

research
05/30/2023

Blockwise Parallel Transformer for Long Context Large Models

Transformers have emerged as the cornerstone of state-of-the-art natural...
research
04/04/2021

TransfoRNN: Capturing the Sequential Information in Self-Attention Representations for Language Modeling

In this paper, we describe the use of recurrent neural networks to captu...
research
08/03/2020

DeLighT: Very Deep and Light-weight Transformer

We introduce a very deep and light-weight transformer, DeLighT, that del...
research
06/02/2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

Transformers use the dense self-attention mechanism which gives a lot of...
research
06/15/2021

PairConnect: A Compute-Efficient MLP Alternative to Attention

Transformer models have demonstrated superior performance in natural lan...
research
09/17/2021

Primer: Searching for Efficient Transformers for Language Modeling

Large Transformer models have been central to recent advances in natural...
research
10/11/2022

Robustify Transformers with Robust Kernel Density Estimation

Recent advances in Transformer architecture have empowered its empirical...

Please sign up or login with your details

Forgot password? Click here to reset