DeepAI AI Chat
Log In Sign Up

Undivided Attention: Are Intermediate Layers Necessary for BERT?

by   Sharath Nittur Sridhar, et al.

In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based architectures have a self-attention block followed by a block of intermediate layers as the basic building component. However, a strong justification for the inclusion of these intermediate layers remains missing in the literature. In this work we investigate the importance of intermediate layers on the overall network performance of downstream tasks. We show that reducing the number of intermediate layers and modifying the architecture for BERT-Base results in minimal loss in fine-tuning accuracy for downstream tasks while decreasing the number of parameters and training time of the model. Additionally, we use the central kernel alignment (CKA) similarity metric and probing classifiers to demonstrate that removing intermediate layers has little impact on the learned self-attention representations.


page 2

page 4


TrimBERT: Tailoring BERT for Trade-offs

Models based on BERT have been extremely successful in solving a variety...

Further Boosting BERT-based Models by Duplicating Existing Layers: Some Intriguing Phenomena inside BERT

Although Bidirectional Encoder Representations from Transformers (BERT) ...

Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference

Aspect based sentiment analysis aims to identify the sentimental tendenc...

SesameBERT: Attention for Anywhere

Fine-tuning with pre-trained models has achieved exceptional results for...

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Pre-trained language models like BERT and its variants have recently ach...

TiltedBERT: Resource Adjustable Version of BERT

In this paper, we proposed a novel adjustable finetuning method that imp...

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

Humans read and write hundreds of billions of messages every day. Furthe...