Investigating representations of verb bias in neural language models

10/05/2020
by   Robert D. Hawkins, et al.
0

Languages typically provide more than one grammatical construction to express certain types of messages. A speaker's choice of construction is known to depend on multiple factors, including the choice of main verb – a phenomenon known as verb bias. Here we introduce DAIS, a large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments. We use this dataset, as well as an existing corpus of naturally occurring data, to evaluate how well recent neural language models capture human preferences. Results show that larger models perform better than smaller models, and transformer architectures (e.g. GPT-2) tend to out-perform recurrent architectures (e.g. LSTMs) even under comparable parameter and training settings. Additional analyses of internal feature representations suggest that transformers may better integrate specific lexical information with grammatical constructions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

SLING: Sino Linguistic Evaluation of Large Language Models

To understand what kinds of linguistic knowledge are encoded by pretrain...
research
09/30/2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Pretrained language models, especially masked language models (MLMs) hav...
research
08/31/2019

Quantity doesn't buy quality syntax with neural language models

Recurrent neural networks can learn to predict upcoming words remarkably...
research
06/02/2021

Examining the Inductive Bias of Neural Language Models with Artificial Languages

Since language models are used to model a wide variety of languages, it ...
research
09/14/2021

NOPE: A Corpus of Naturally-Occurring Presuppositions in English

Understanding language requires grasping not only the overtly stated con...
research
05/12/2016

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

We introduce polyglot language models, recurrent neural network models t...
research
09/21/2023

Choice-75: A Dataset on Decision Branching in Script Learning

Script learning studies how daily events unfold. Previous works tend to ...

Please sign up or login with your details

Forgot password? Click here to reset