Information-Theoretic Probing with Minimum Description Length

03/27/2020
by   Elena Voita, et al.
0

To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes, differences in their accuracy fail to adequately reflect differences in representations. For example, they do not substantially favour pretrained representations over randomly initialized ones. Analogously, their accuracy can be similar when probing for genuine linguistic labels and probing for random synthetic tasks. To see reasonable differences in accuracy with respect to these random baselines, previous work had to constrain either the amount of probe training data or its model size. Instead, we propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL). With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data. Therefore, the measure of interest changes from probe accuracy to the description length of labels given representations. In addition to probe quality, the description length evaluates "the amount of effort" needed to achieve the quality. This amount of effort characterizes either (i) size of a probing model, or (ii) the amount of data needed to achieve the high quality. We consider two methods for estimating MDL which can be easily implemented on top of the standard probing pipelines: variational coding and online coding. We show that these methods agree in results and are more informative and stable than the standard probes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2021

Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

NLP has a rich history of representing our prior understanding of langua...
research
09/08/2019

Designing and Interpreting Probes with Control Tasks

Probes, supervised models trained to predict properties (like parts-of-s...
research
04/07/2020

Information-Theoretic Probing for Linguistic Structure

The success of neural networks on a diverse set of NLP tasks has led res...
research
09/13/2021

Not All Models Localize Linguistic Knowledge in the Same Place: A Layer-wise Probing on BERToids' Representations

Most of the recent works on probing representations have focused on BERT...
research
09/15/2020

An information theoretic view on selecting linguistic probes

There is increasing interest in assessing the linguistic knowledge encod...
research
09/15/2020

Evaluating representations by the complexity of learning low-loss predictors

We consider the problem of evaluating representations of data for use in...
research
05/09/2022

EigenNoise: A Contrastive Prior to Warm-Start Representations

In this work, we present a naive initialization scheme for word vectors ...

Please sign up or login with your details

Forgot password? Click here to reset