CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT

12/22/2022
by   Dan DeGenaro, et al.
0

Large language models having hundreds of millions, and even billions, of parameters have performed extremely well on a variety of natural language processing (NLP) tasks. Their widespread use and adoption, however, is hindered by the lack of availability and portability of sufficiently large computational resources. This paper proposes a knowledge distillation (KD) technique building on the work of LightMBERT, a student model of multilingual BERT (mBERT). By repeatedly distilling mBERT through increasingly compressed toplayer distilled teacher assistant networks, CAMeMBERT aims to improve upon the time and space complexities of mBERT while keeping loss of accuracy beneath an acceptable threshold. At present, CAMeMBERT has an average accuracy of around 60.1 is subject to change after future improvements to the hyperparameters used in fine-tuning.

READ FULL TEXT
research
08/26/2023

Improving Knowledge Distillation for BERT Models: Loss Functions, Mapping Methods, and Weight Tuning

The use of large transformer-based models such as BERT, GPT, and T5 has ...
research
06/24/2023

Comparison of Pre-trained Language Models for Turkish Address Parsing

Transformer based pre-trained models such as BERT and its variants, whic...
research
06/11/2021

RefBERT: Compressing BERT by Referencing to Pre-computed Representations

Recently developed large pre-trained language models, e.g., BERT, have a...
research
02/24/2022

TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety...
research
10/13/2020

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance

Pre-trained language models (e.g., BERT) have achieved significant succe...
research
05/25/2022

Towards Understanding Label Regularization for Fine-tuning Pre-trained Language Models

Knowledge Distillation (KD) is a prominent neural model compression tech...
research
06/19/2020

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

Humans read and write hundreds of billions of messages every day. Furthe...

Please sign up or login with your details

Forgot password? Click here to reset