Log In Sign Up

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

by   Alex Warstadt, et al.

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4 Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.


page 5

page 7

page 8


CLiMP: A Benchmark for Chinese Language Model Evaluation

Linguistically informed analyses of language models (LMs) contribute to ...

SLING: Sino Linguistic Evaluation of Large Language Models

To understand what kinds of linguistic knowledge are encoded by pretrain...

Acceptability Judgements via Examining the Topology of Attention Maps

The role of the attention mechanism in encoding linguistic knowledge has...

DALL-E 2 Fails to Reliably Capture Common Syntactic Processes

Machine intelligence is increasingly being linked to claims about sentie...

Indicatements that character language models learn English morpho-syntactic units and regularities

Character language models have access to surface morphological patterns,...

Evaluating German Transformer Language Models with Syntactic Agreement Tests

Pre-trained transformer language models (TLMs) have recently refashioned...

On the Limits of Minimal Pairs in Contrastive Evaluation

Minimal sentence pairs are frequently used to analyze the behavior of la...