TrimBERT: Tailoring BERT for Trade-offs

02/24/2022
by   Sharath Nittur Sridhar, et al.
3

Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and fine-tuning which limits wider adoptability. While self-attention layers have been well-studied, a strong justification for inclusion of the intermediate layers which follow them remains missing in the literature. In this work, we show that reducing the number of intermediate layers in BERT-Base results in minimal fine-tuning accuracy loss of downstream tasks while significantly decreasing model size and training time. We further mitigate two key bottlenecks, by replacing all softmax operations in the self-attention layers with a computationally simpler alternative and removing half of all layernorm operations. This further decreases the training time while maintaining a high level of fine-tuning accuracy.

READ FULL TEXT

page 2

page 3

page 4

research
12/22/2020

Undivided Attention: Are Intermediate Layers Necessary for BERT?

In recent times, BERT-based models have been extremely successful in sol...
research
10/08/2019

SesameBERT: Attention for Anywhere

Fine-tuning with pre-trained models has achieved exceptional results for...
research
12/31/2020

EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Deep, heavily overparameterized language models such as BERT, XLNet and ...
research
02/02/2021

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning

With the rapid adoption of machine learning (ML), a number of domains no...
research
12/22/2022

CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT

Large language models having hundreds of millions, and even billions, of...
research
02/21/2023

Device Tuning for Multi-Task Large Model

Unsupervised pre-training approaches have achieved great success in many...
research
06/19/2020

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

Humans read and write hundreds of billions of messages every day. Furthe...

Please sign up or login with your details

Forgot password? Click here to reset