Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

04/07/2020
by   Bowen Wu, et al.
0

Recently, BERT has become an essential ingredient of various NLP deep models due to its effectiveness and universal-usability. However, the online deployment of BERT is often blocked by its large-scale parameters and high computational cost. There are plenty of studies showing that the knowledge distillation is efficient in transferring the knowledge from BERT into the model with a smaller size of parameters. Nevertheless, current BERT distillation approaches mainly focus on task-specified distillation, such methodologies lead to the loss of the general semantic knowledge of BERT for universal-usability. In this paper, we propose a sentence representation approximating oriented distillation framework that can distill the pre-trained BERT into a simple LSTM based model without specifying tasks. Consistent with BERT, our distilled model is able to perform transfer learning via fine-tuning to adapt to any sentence-level downstream task. Besides, our model can further cooperate with task-specific distillation procedures. The experimental results on multiple NLP tasks from the GLUE benchmark show that our approach outperforms other task-specific distillation methods or even much larger models, i.e., ELMO, with efficiency well-improved.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression

Pretrained language models (PLMs) such as BERT adopt a training paradigm...
research
09/15/2021

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

Pre-trained language models have shown remarkable results on various NLP...
research
10/02/2019

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

As Transfer Learning from large-scale pre-trained models becomes more pr...
research
10/28/2022

BEBERT: Efficient and robust binary ensemble BERT

Pre-trained BERT models have achieved impressive accuracy on natural lan...
research
07/12/2021

A Flexible Multi-Task Model for BERT Serving

In this demonstration, we present an efficient BERT-based multi-task (MT...
research
09/03/2019

Transfer Fine-Tuning: A BERT Case Study

A semantic equivalence assessment is defined as a task that assesses sem...
research
07/21/2020

Understanding BERT Rankers Under Distillation

Deep language models such as BERT pre-trained on large corpus have given...

Please sign up or login with your details

Forgot password? Click here to reset