DeepAI
Log In Sign Up

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

12/02/2019
by   Alex Warstadt, et al.
0

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4 Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.

READ FULL TEXT

page 5

page 7

page 8

01/26/2021

CLiMP: A Benchmark for Chinese Language Model Evaluation

Linguistically informed analyses of language models (LMs) contribute to ...
10/21/2022

SLING: Sino Linguistic Evaluation of Large Language Models

To understand what kinds of linguistic knowledge are encoded by pretrain...
05/19/2022

Acceptability Judgements via Examining the Topology of Attention Maps

The role of the attention mechanism in encoding linguistic knowledge has...
10/23/2022

DALL-E 2 Fails to Reliably Capture Common Syntactic Processes

Machine intelligence is increasingly being linked to claims about sentie...
08/31/2018

Indicatements that character language models learn English morpho-syntactic units and regularities

Character language models have access to surface morphological patterns,...
07/07/2020

Evaluating German Transformer Language Models with Syntactic Agreement Tests

Pre-trained transformer language models (TLMs) have recently refashioned...
09/15/2021

On the Limits of Minimal Pairs in Contrastive Evaluation

Minimal sentence pairs are frequently used to analyze the behavior of la...