Information-Theoretic Probing for Linguistic Structure

by   Tiago Pimentel, et al.

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much do these networks actually know about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotation in that linguistic task from the network's learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that such models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic formalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate. The empirical portion of our paper focuses on obtaining tight estimates for how much information BERT knows about parts of speech in a set of five typologically diverse languages that are often underrepresented in parsing research, plus English, totaling six languages. We find BERT accounts for only at most 5


page 1

page 2

page 3

page 4


Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

NLP has a rich history of representing our prior understanding of langua...

Designing and Interpreting Probes with Control Tasks

Probes, supervised models trained to predict properties (like parts-of-s...

An information theoretic view on selecting linguistic probes

There is increasing interest in assessing the linguistic knowledge encod...

On the Possibility of Rewarding Structure Learning Agents: Mutual Information on Linguistic Random Sets

We present a first attempt to elucidate an Information-Theoretic approac...

Information-Theoretic Probing with Minimum Description Length

To measure how well pretrained representations encode some linguistic pr...

A Latent-Variable Model for Intrinsic Probing

The success of pre-trained contextualized representations has prompted r...

A Non-Linear Structural Probe

Probes are models devised to investigate the encoding of knowledge – e.g...