schuBERT: Optimizing Elements of BERT

05/09/2020
by   Ashish Khetan, et al.
0

Transformers <cit.> have gradually become a key component for many state-of-the-art natural language representation models. A recent Transformer based model- BERT <cit.> achieved state-of-the-art results on various natural language processing tasks, including GLUE, SQuAD v1.1, and SQuAD v2.0. This model however is computationally prohibitive and has a huge number of parameters. In this work we revisit the architecture choices of BERT in efforts to obtain a lighter model. We focus on reducing the number of parameters yet our methods can be applied towards other objectives such FLOPs or latency. We show that much efficient light BERT models can be obtained by reducing algorithmically chosen correct architecture design dimensions rather than reducing the number of Transformer encoder layers. In particular, our schuBERT gives 6.6% higher average accuracy on GLUE and SQuAD datasets as compared to BERT with three encoder layers while having the same number of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Transformer-based models pre-trained on large-scale corpora achieve stat...
research
11/03/2021

BERT-DRE: BERT with Deep Recursive Encoder for Natural Language Sentence Matching

This paper presents a deep neural architecture, for Natural Language Sen...
research
12/03/2018

Structure Learning Using Forced Pruning

Markov networks are widely used in many Machine Learning applications in...
research
11/27/2020

Progressively Stacking 2.0: A Multi-stage Layerwise Training Method for BERT Training Speedup

Pre-trained language models, such as BERT, have achieved significant acc...
research
03/31/2023

BERTino: an Italian DistilBERT model

The recent introduction of Transformers language representation models a...
research
11/27/2020

CoRe: An Efficient Coarse-refined Training Framework for BERT

In recent years, BERT has made significant breakthroughs on many natural...
research
02/10/2023

Element-Wise Attention Layers: an option for optimization

The use of Attention Layers has become a trend since the popularization ...

Please sign up or login with your details

Forgot password? Click here to reset