Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

02/27/2020
by   Prakhar Ganesh, et al.
3

Transformer-based models pre-trained on large-scale corpora achieve state-of-the-art accuracy for natural language processing tasks, but are too resource-hungry and compute-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy is model compression, which has attracted extensive attention. This paper summarizes the branches of research on compressing Transformers, focusing on the especially popular BERT model. BERT's complex architecture means that a compression technique that is highly effective on one part of the model, e.g., attention layers, may be less successful on another part, e.g., fully connected layers. In this systematic study, we identify the state of the art in compression for each part of BERT, clarify current best practices for compressing large-scale Transformer models, and provide insights into the inner workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving a lightweight, accurate, and generic natural language processing model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2023

Block-wise Bit-Compression of Transformer-based Models

With the popularity of the recent Transformer-based models represented b...
research
02/12/2021

Optimizing Inference Performance of Transformers on CPUs

The Transformer architecture revolutionized the field of natural languag...
research
05/09/2020

schuBERT: Optimizing Elements of BERT

Transformers <cit.> have gradually become a key component for many state...
research
10/04/2021

Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics

Much of recent progress in NLU was shown to be due to models' learning d...
research
04/08/2020

Poor Man's BERT: Smaller and Faster Transformer Models

The ongoing neural revolution in Natural Language Processing has recentl...
research
06/04/2021

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

Despite superior performance on various natural language processing task...
research
08/18/2021

SHAQ: Single Headed Attention with Quasi-Recurrence

Natural Language Processing research has recently been dominated by larg...

Please sign up or login with your details

Forgot password? Click here to reset