Accelerating Pre-trained Language Models via Calibrated Cascade

12/29/2020
by   Lei Li, et al.
6

Dynamic early exiting aims to accelerate pre-trained language models' (PLMs) inference by exiting in shallow layer without passing through the entire model. In this paper, we analyze the working mechanism of dynamic early exiting and find it cannot achieve a satisfying trade-off between inference speed and performance. On one hand, the PLMs' representations in shallow layers are not sufficient for accurate prediction. One the other hand, the internal off-ramps cannot provide reliable exiting decisions. To remedy this, we instead propose CascadeBERT, which dynamically selects a proper-sized, complete model in a cascading manner. To obtain more reliable model selection, we further devise a difficulty-aware objective, encouraging the model output class probability to reflect the real difficulty of each instance. Extensive experimental results demonstrate the superiority of our proposal over strong baseline models of PLMs' acceleration including both dynamic early exiting and knowledge distillation methods.

READ FULL TEXT
06/07/2021

RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models

Pre-trained language models achieve outstanding performance in NLP tasks...
06/07/2020

BERT Loses Patience: Fast and Robust Inference with Early Exit

In this paper, we propose Patience-based Early Exit, a straightforward y...
04/27/2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Large-scale pre-trained language models such as BERT have brought signif...
05/16/2022

Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt

Data-free knowledge distillation (DFKD) conducts knowledge distillation ...
04/12/2020

TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

Deep and large pre-trained language models are the state-of-the-art for ...
06/09/2022

Predictive Exit: Prediction of Fine-Grained Early Exits for Computation- and Energy-Efficient Inference

By adding exiting layers to the deep learning networks, early exit can t...