Token Dropping for Efficient BERT Pretraining

03/24/2022
by   Le Hou, et al.
0

Transformer-based models generally allocate the same amount of computation for each token in a given sequence. We develop a simple but effective "token dropping" method to accelerate the pretraining of transformer models, such as BERT, without degrading its performance on downstream tasks. In short, we drop unimportant tokens starting from an intermediate layer in the model to make the model focus on important tokens; the dropped tokens are later picked up by the last layer of the model so that the model still produces full-length sequences. We leverage the already built-in masked language modeling (MLM) loss to identify unimportant tokens with practically no computational overhead. In our experiments, this simple approach reduces the pretraining cost of BERT by 25 while achieving similar overall fine-tuning performance on standard downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Revisiting Token Dropping Strategy in Efficient BERT Pretraining

Token dropping is a recently-proposed strategy to speed up the pretraini...
research
10/10/2022

Parameter-Efficient Tuning with Special Token Adaptation

Parameter-efficient tuning aims at updating only a small subset of param...
research
02/28/2023

Weighted Sampling for Masked Language Modeling

Masked Language Modeling (MLM) is widely used to pretrain language model...
research
02/04/2023

Representation Deficiency in Masked Language Modeling

Masked Language Modeling (MLM) has been one of the most prominent approa...
research
11/17/2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures fo...
research
03/16/2022

AdapLeR: Speeding up Inference by Adaptive Length Reduction

Pre-trained language models have shown stellar performance in various do...
research
06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...

Please sign up or login with your details

Forgot password? Click here to reset