Open Sesame: Getting Inside BERT's Linguistic Knowledge

06/04/2019
by   Yongjie Lin, et al.
0

How and to what extent does BERT encode syntactically-sensitive hierarchical information or positionally-sensitive linear information? Recent work has shown that contextual representations like BERT perform well on tasks that require sensitivity to linguistic structure. We present here two studies which aim to provide a better understanding of the nature of BERT's representations. The first of these focuses on the identification of structurally-defined elements using diagnostic classifiers, while the second explores BERT's representation of subject-verb agreement and anaphor-antecedent dependencies through a quantitative assessment of self-attention vectors. In both cases, we find that BERT encodes positional information about word tokens well on its lower layers, but switches to a hierarchically-oriented encoding on higher layers. We conclude then that BERT's representations do indeed model linguistically relevant aspects of hierarchical structure, though they do not appear to show the sharp sensitivity to hierarchical structure that is found in human processing of reflexive anaphora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2019

Does BERT agree? Evaluating knowledge of structure dependence through agreement relations

Learning representations that accurately model semantics is an important...
research
11/24/2020

Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

As the name implies, contextualized representations of language are typi...
research
04/19/2022

Probing for the Usage of Grammatical Number

A central quest of probing is to uncover how pre-trained models encode a...
research
04/03/2021

Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

Several studies have been carried out on revealing linguistic features c...
research
07/12/2022

Using Paraphrases to Study Properties of Contextual Embeddings

We use paraphrases as a unique source of data to analyze contextualized ...
research
09/05/2019

Investigating BERT's Knowledge of Language: Five Analysis Methods with NPIs

Though state-of-the-art sentence representation models can perform tasks...
research
01/24/2020

PoWER-BERT: Accelerating BERT inference for Classification Tasks

BERT has emerged as a popular model for natural language understanding. ...

Please sign up or login with your details

Forgot password? Click here to reset