Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

by   Hosein Mohebbi, et al.

Several studies have been carried out on revealing linguistic features captured by BERT. This is usually achieved by training a diagnostic classifier on the representations obtained from different layers of BERT. The subsequent classification accuracy is then interpreted as the ability of the model in encoding the corresponding linguistic property. Despite providing insights, these studies have left out the potential role of token representations. In this paper, we provide an analysis on the representation space of BERT in search for distinct and meaningful subspaces that can explain probing results. Based on a set of probing tasks and with the help of attribution methods we show that BERT tends to encode meaningful knowledge in specific token representations (which are often ignored in standard classification setups), allowing the model to detect syntactic and semantic abnormalities, and to distinctively separate grammatical number and tense subspaces.



page 12

page 13


Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language ...

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Attribution methods assess the contribution of inputs (e.g., words) to t...

Probing for the Usage of Grammatical Number

A central quest of probing is to uncover how pre-trained models encode a...

The Low-Dimensional Linear Geometry of Contextualized Word Representations

Black-box probing models can reliably extract linguistic features like t...

Picking BERT's Brain: Probing for Linguistic Dependencies in Contextualized Embeddings Using Representational Similarity Analysis

As the name implies, contextualized representations of language are typi...

Open Sesame: Getting Inside BERT's Linguistic Knowledge

How and to what extent does BERT encode syntactically-sensitive hierarch...

DirectProbe: Studying Representations without Classifiers

Understanding how linguistic structures are encoded in contextualized em...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.