Probing BERT's priors with serial reproduction chains

02/24/2022
by   Takateru Yamakoshi, et al.
0

We can learn as much about language models from what they say as we learn from their performance on targeted benchmarks. Sampling is a promising bottom-up method for probing, but generating samples from successful models like BERT remains challenging. Taking inspiration from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to probe BERT's priors. Although the masked language modeling objective does not guarantee a consistent joint distribution, we observe that a unique and consistent estimator of the ground-truth joint distribution may be obtained by a GSN sampler, which randomly selects which word to mask and reconstruct on each step. We compare the lexical and syntactic statistics of sentences from the resulting prior distribution against those of the ground-truth corpus distribution and elicit a large empirical sample of naturalness judgments to investigate how, exactly, the model deviates from human speakers. Our findings suggest the need to move beyond top-down evaluation methods toward bottom-up probing to capture the full richness of what has been learned about language.

READ FULL TEXT

page 2

page 3

page 6

page 9

page 10

page 11

page 14

page 15

research
05/07/2021

Understanding by Understanding Not: Modeling Negation in Language Models

Negation is a core construction in natural language. Despite being very ...
research
05/12/2020

Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

It is commonly believed that knowledge of syntactic structure should imp...
research
06/06/2018

The Limitations of Cross-language Word Embeddings Evaluation

The aim of this work is to explore the possible limitations of existing ...
research
06/04/2021

Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing

Analysing whether neural language models encode linguistic information h...
research
03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...
research
10/03/2021

Unsupervised paradigm for information extraction from transcripts using BERT

Audio call transcripts are one of the valuable sources of information fo...
research
07/11/2023

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

We investigate the role of various demonstration components in the in-co...

Please sign up or login with your details

Forgot password? Click here to reset