Information-Theoretic Probing for Linguistic Structure

by   Tiago Pimentel, et al.

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much do these networks actually know about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotation in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that such models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic formalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate. The empirical portion of our paper focuses on obtaining tight estimates for how much information BERT knows about parts of speech in a set of five typologically diverse languages that are often underrepresented in parsing research, plus English, totaling six languages. We find BERT accounts for only at most 5


page 1

page 2

page 3

page 4


Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

NLP has a rich history of representing our prior understanding of langua...

Designing and Interpreting Probes with Control Tasks

Probes, supervised models trained to predict properties (like parts-of-s...

An information theoretic view on selecting linguistic probes

There is increasing interest in assessing the linguistic knowledge encod...

Information-Theoretic Probing with Minimum Description Length

To measure how well pretrained representations encode some linguistic pr...

A Latent-Variable Model for Intrinsic Probing

The success of pre-trained contextualized representations has prompted r...

Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations

Most of the recent works on probing representations have focused on BERT...

Please sign up or login with your details

Forgot password? Click here to reset