The Architectural Bottleneck Principle

11/11/2022
by   Tiago Pimentel, et al.
0

In this paper, we seek to measure how much information a component in a neural network could extract from the representations fed into it. Our work stands in contrast to prior probing work, most of which investigates how much information a model's representations contain. This shift in perspective leads us to propose a new principle for probing, the architectural bottleneck principle: In order to estimate how much information a given component could extract, a probe should look exactly like the component. Relying on this principle, we estimate how much syntactic information is available to transformers through our attentional probe, a probe that exactly resembles a transformer's self-attention head. Experimentally, we find that, in three models (BERT, ALBERT, and RoBERTa), a sentence's syntax tree is mostly extractable by our probe, suggesting these models have access to syntactic information while composing their contextual representations. Whether this information is actually used by these models, however, remains an open question.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2021

A Non-Linear Structural Probe

Probes are models devised to investigate the encoding of knowledge – e.g...
research
04/20/2022

When Does Syntax Mediate Neural Language Model Performance? Evidence from Dropout Probes

Recent causal probing literature reveals when language models and syntac...
research
06/04/2021

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

Analysing whether neural language models encode linguistic information h...
research
05/06/2021

Bird's Eye: Probing for Linguistic Graph Structures with a Simple Information-Theoretic Approach

NLP has a rich history of representing our prior understanding of langua...
research
11/27/2019

Do Attention Heads in BERT Track Syntactic Dependencies?

We investigate the extent to which individual attention heads in pretrai...
research
04/13/2022

Probing for Constituency Structure in Neural Language Models

In this paper, we investigate to which extent contextual neural language...
research
10/05/2020

Pareto Probing: Trading Off Accuracy for Complexity

The question of how to probe contextual word representations in a way th...

Please sign up or login with your details

Forgot password? Click here to reset