Refining Targeted Syntactic Evaluation of Language Models

04/19/2021
by   Benjamin Newman, et al.
6

Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models' syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb's conjugation. The method evaluates whether language models rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, evaluating the systematicity of a language model's syntactic knowledge: given a sentence, can it conjugate arbitrary verbs correctly? Second, evaluating a model's likely behavior: given a sentence, does the model concentrate its probability mass on correctly conjugated verbs, even if only on a subset of the possible verbs? We argue that current implementations of TSE do not directly capture either of these goals, and propose new metrics to capture each goal separately. Under our metrics, we find that TSE overestimates systematicity of language models, but that models score up to 40 verbs that they predict are likely in context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Targeted Syntactic Evaluation of Language Models

We present a dataset for evaluating the grammaticality of the prediction...
research
11/07/2022

Probing neural language models for understanding of words of estimative probability

Words of estimative probability (WEP) are expressions of a statement's p...
research
09/16/2021

The Language Model Understood the Prompt was Ambiguous: Probing Syntactic Uncertainty Through Generation

Temporary syntactic ambiguities arise when the beginning of a sentence i...
research
12/02/2022

Event knowledge in large language models: the gap between the impossible and the unlikely

People constantly use language to learn about the world. Computational l...
research
12/18/2022

Language model acceptability judgements are not always robust to context

Targeted syntactic evaluations of language models ask whether models sho...
research
06/18/2021

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

Language models can generate harmful and biased outputs and exhibit unde...
research
10/04/2021

A Novel Metric for Evaluating Semantics Preservation

In this paper, we leverage pre-trained language models (PLMs) to precise...

Please sign up or login with your details

Forgot password? Click here to reset