What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models?

05/31/2023
by   Ruixiang Cui, et al.
0

Humans can effortlessly understand the coordinate structure of sentences such as "Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively". In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of "respectively". We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2021

A Systematic Investigation of Commonsense Understanding in Large Language Models

Large language models have shown impressive performance on many natural ...
research
12/13/2022

A fine-grained comparison of pragmatic language understanding in humans and language models

Pragmatics is an essential part of communication, but it remains unclear...
research
05/24/2023

ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories

Recently, Pretrained Language Models (PLMs) have been serving as general...
research
06/07/2020

Language Models as Fact Checkers?

Recent work has suggested that language models (LMs) store both common-s...
research
08/08/2019

Do Neural Language Representations Learn Physical Commonsense?

Humans understand language based on the rich background knowledge about ...
research
07/11/2023

Synthetic Dataset for Evaluating Complex Compositional Knowledge for Natural Language Inference

We introduce a synthetic dataset called Sentences Involving Complex Comp...
research
05/23/2023

Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

Large language models (e.g., GPT-3.5) are uniquely capable of producing ...

Please sign up or login with your details

Forgot password? Click here to reset