Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling

10/10/2022
by   Haw-Shiuan Chang, et al.
0

Ensembling BERT models often significantly improves accuracy, but at the cost of significantly more computation and memory footprint. In this work, we propose Multi-CLS BERT, a novel ensembling method for CLS-based prediction tasks that is almost as efficient as a single BERT model. Multi-CLS BERT uses multiple CLS tokens with a parameterization and objective that encourages their diversity. Thus instead of fine-tuning each BERT model in an ensemble (and running them all at test time), we need only fine-tune our single Multi-CLS BERT model (and run the one model at test time, ensembling just the multiple final CLS embeddings). To test its effectiveness, we build Multi-CLS BERT on top of a state-of-the-art pretraining method for BERT (Aroca-Ouellette and Rudzicz, 2020). In experiments on GLUE and SuperGLUE we show that our Multi-CLS BERT reliably improves both overall accuracy and confidence estimation. When only 100 training samples are available in GLUE, the Multi-CLS BERT_Base model can even outperform the corresponding BERT_Large model. We analyze the behavior of our Multi-CLS BERT, showing that it has many of the same characteristics and behavior as a typical BERT 5-way ensemble, but with nearly 4-times less computation and memory.

READ FULL TEXT

page 1

page 3

page 4

page 17

research
02/01/2022

Improving BERT-based Query-by-Document Retrieval with Multi-Task Optimization

Query-by-document (QBD) retrieval is an Information Retrieval task in wh...
research
05/03/2022

Efficient Fine-Tuning of BERT Models on the Edge

Resource-constrained devices are increasingly the deployment targets of ...
research
10/17/2020

HABERTOR: An Efficient and Effective Deep Hatespeech Detector

We present our HABERTOR model for detecting hatespeech in large scale us...
research
11/04/2020

Investigating Novel Verb Learning in BERT: Selectional Preference Classes and Alternation-Based Syntactic Generalization

Previous studies investigating the syntactic abilities of deep learning ...
research
10/28/2022

Feature Engineering vs BERT on Twitter Data

In this paper, we compare the performances of traditional machine learni...
research
07/12/2021

A Flexible Multi-Task Model for BERT Serving

In this demonstration, we present an efficient BERT-based multi-task (MT...
research
10/12/2020

Layer-wise Guided Training for BERT: Learning Incrementally Refined Document Representations

Although BERT is widely used by the NLP community, little is known about...

Please sign up or login with your details

Forgot password? Click here to reset