A Primer in BERTology: What we know about how BERT works

02/27/2020
by   Anna Rogers, et al.
0

Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. We also provide an overview of the proposed modifications to the model and its training regime. We then outline the directions for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2021

Re-Evaluating GermEval17 Using German Pre-Trained Language Models

The lack of a commonly used benchmark data set (collection) such as (Sup...
research
10/16/2019

Efficiency through Auto-Sizing: Notre Dame NLP's Submission to the WNGT 2019 Efficiency Task

This paper describes the Notre Dame Natural Language Processing Group's ...
research
05/16/2019

Latent Universal Task-Specific BERT

This paper describes a language representation model which combines the ...
research
11/09/2019

BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA

The BERT language model (LM) (Devlin et al., 2019) is surprisingly good ...
research
02/03/2021

Neural Transfer Learning with Transformers for Social Science Text Analysis

During the last years, there have been substantial increases in the pred...
research
10/09/2021

On the asymptotic behavior of bubble date estimators

In this study, we extend the three-regime bubble model of Pang et al. (2...
research
08/07/2019

Embedding-based system for the Text part of CALL v3 shared task

This paper presents a scoring system that has shown the top result on th...

Please sign up or login with your details

Forgot password? Click here to reset