Open Sesame: Getting Inside BERT's Linguistic Knowledge

by   Yongjie Lin, et al.
Yale University

How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora.


page 1

page 2

page 3

page 4


Does BERT agree? Evaluating knowledge of structure dependence through agreement relations

Learning representations that accurately model semantics is an important...

Probing for the Usage of Grammatical Number

A central quest of probing is to uncover how pre-trained models encode a...

Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

Several studies have been carried out on revealing linguistic features c...

Using Paraphrases to Study Properties of Contextual Embeddings

We use paraphrases as a unique source of data to analyze contextualized ...

Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs

Though state-of-the-art sentence representation models can perform tasks...

PoWER-BERT: Accelerating BERT inference for Classification Tasks

BERT has emerged as a popular model for natural language understanding. ...

Please sign up or login with your details

Forgot password? Click here to reset