Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

11/18/2021
by   Shira Guskin, et al.
0

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer representation having fewer layers and smaller internal embedding. However, TinyBERT's performance drops when we reduce the number of layers by 50 drops even more abruptly when we reduce the number of layers by 75 advanced NLP tasks such as span question answering. Additionally, a separate model must be trained for each inference scenario with its distinct computational budget. In this work we present Dynamic-TinyBERT, a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1 reproduce our work will be open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Limited computational budgets often prevent transformers from being used...
research
10/14/2020

Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search

Although transformers have achieved impressive accuracies in various tas...
research
04/15/2022

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

With many real-world applications of Natural Language Processing (NLP) c...
research
02/17/2020

Controlling Computation versus Quality for Neural Sequence Models

Most neural networks utilize the same amount of compute for every exampl...
research
11/09/2020

VisBERT: Hidden-State Visualizations for Transformers

Explainability and interpretability are two important concepts, the abse...
research
10/22/2020

AdapterDrop: On the Efficiency of Adapters in Transformers

Massively pre-trained transformer models are computationally expensive t...
research
02/25/2020

Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0

In this paper we explore the parameter efficiency of BERT arXiv:1810.048...

Please sign up or login with your details

Forgot password? Click here to reset