LazyFormer: Self Attention with Lazy Update

02/25/2021
by   Chengxuan Ying, et al.
0

Improving the efficiency of Transformer-based language pre-training is an important task in NLP, especially for the self-attention module, which is computationally expensive. In this paper, we propose a simple but effective solution, called LazyFormer, which computes the self-attention distribution infrequently. LazyFormer composes of multiple lazy blocks, each of which contains multiple Transformer layers. In each lazy block, the self-attention distribution is only computed once in the first layer and then is reused in all upper layers. In this way, the cost of computation could be largely saved. We also provide several training tricks for LazyFormer. Extensive experiments demonstrate the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2023

Skip-Attention: Improving Vision Transformers by Paying Less Attention

This work aims to improve the efficiency of vision transformers (ViT). W...
research
09/14/2019

Tree Transformer: Integrating Tree Structures into Self-Attention

Pre-training Transformer from large-scale raw texts and fine-tuning on t...
research
06/02/2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

Transformers use the dense self-attention mechanism which gives a lot of...
research
02/25/2021

SparseBERT: Rethinking the Importance Analysis in Self-attention

Transformer-based models are popular for natural language processing (NL...
research
05/02/2020

DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Transformer-based QA models use input-wide self-attention – i.e. across ...
research
12/10/2022

Position Embedding Needs an Independent Layer Normalization

The Position Embedding (PE) is critical for Vision Transformers (VTs) du...
research
04/12/2020

Relational Learning between Multiple Pulmonary Nodules via Deep Set Attention Transformers

Diagnosis and treatment of multiple pulmonary nodules are clinically imp...

Please sign up or login with your details

Forgot password? Click here to reset