Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

04/03/2021
by   Hosein Mohebbi, et al.
0

Several studies have been carried out on revealing linguistic features captured by BERT. This is usually achieved by training a diagnostic classifier on the representations obtained from different layers of BERT. The subsequent classification accuracy is then interpreted as the ability of the model in encoding the corresponding linguistic property. Despite providing insights, these studies have left out the potential role of token representations. In this paper, we provide an analysis on the representation space of BERT in search for distinct and meaningful subspaces that can explain probing results. Based on a set of probing tasks and with the help of attribution methods we show that BERT tends to encode meaningful knowledge in specific token representations (which are often ignored in standard classification setups), allowing the model to detect syntactic and semantic abnormalities, and to distinctively separate grammatical number and tense subspaces.

READ FULL TEXT

Authors

page 12

page 13

06/06/2019

Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language ...
04/30/2020

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Attribution methods assess the contribution of inputs (e.g., words) to t...
04/19/2022

Probing for the Usage of Grammatical Number

A central quest of probing is to uncover how pre-trained models encode a...
05/15/2021

The Low-Dimensional Linear Geometry of Contextualized Word Representations

Black-box probing models can reliably extract linguistic features like t...
11/24/2020

Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

As the name implies, contextualized representations of language are typi...
06/04/2019

Open Sesame: Getting Inside BERT's Linguistic Knowledge

How and to what extent does BERT encode syntactically-sensitive hierarch...
04/13/2021

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.