Further Boosting BERT-based Models by Duplicating Existing Layers: Some Intriguing Phenomena inside BERT

01/25/2020
by   Wei-Tsung Kao, et al.
0

Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box, so much previous work has tried to lift the veil of BERT and understand the functionality of each layer. In this paper, we found that removing or duplicating most layers in BERT would not change their outputs. This fact remains true across a wide variety of BERT-based models. Based on this observation, we propose a quite simple method to boost the performance of BERT. By duplicating some layers in the BERT-based models to make it deeper (no extra training required in this step), they obtain better performance in the down-stream tasks after fine-tuning.

READ FULL TEXT
research
12/22/2020

Undivided Attention: Are Intermediate Layers Necessary for BERT?

In recent times, BERT-based models have been extremely successful in sol...
research
04/09/2022

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers

The mainstream BERT/GPT model contains only 10 to 20 layers, and there i...
research
05/02/2022

BERTops: Studying BERT Representations under a Topological Lens

Proposing scoring functions to effectively understand, analyze and learn...
research
03/12/2021

Explaining and Improving BERT Performance on Lexical Semantic Change Detection

Type- and token-based embedding architectures are still competing in lex...
research
04/17/2023

Classification of US Supreme Court Cases using BERT-Based Techniques

Models based on bidirectional encoder representations from transformers ...
research
05/22/2020

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Deep learning (DL) based predictive models from electronic health record...
research
01/29/2021

Fine-tuning BERT-based models for Plant Health Bulletin Classification

In the era of digitization, different actors in agriculture produce nume...

Please sign up or login with your details

Forgot password? Click here to reset