BEBERT: Efficient and robust binary ensemble BERT

10/28/2022
by   Jiayi Tian, et al.
0

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly outperforms the existing binary BERT models in accuracy and robustness with a 2x speedup on training time. Moreover, our BEBERT has only a negligible accuracy loss of 0.3 full-precision baseline while saving 15x and 13x in FLOPs and model size, respectively. In addition, BEBERT also outperforms other compressed BERTs in accuracy by up to 6.7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2020

BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks

As a pre-trained Transformer model, BERT (Bidirectional Encoder Represen...
research
04/07/2020

Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation

Recently, BERT has become an essential ingredient of various NLP deep mo...
research
03/25/2022

MKQ-BERT: Quantized BERT with 4-bits Weights and Activations

Recently, pre-trained Transformer based language models, such as BERT, h...
research
11/16/2022

Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT

As an application of Natural Language Processing (NLP) techniques, finan...
research
11/27/2020

CoRe: An Efficient Coarse-refined Training Framework for BERT

In recent years, BERT has made significant breakthroughs on many natural...
research
05/29/2020

SAFER: A Structure-free Approach for Certified Robustness to Adversarial Word Substitutions

State-of-the-art NLP models can often be fooled by human-unaware transfo...
research
10/31/2022

QuaLA-MiniLM: a Quantized Length Adaptive MiniLM

Limited computational budgets often prevent transformers from being used...

Please sign up or login with your details

Forgot password? Click here to reset