Testing the limits of natural language models for predicting human language judgments

04/07/2022
by   Tal Golan, et al.
2

Neural network language models can serve as computational hypotheses about how humans process language. We compared the model-human consistency of diverse language models using a novel experimental approach: controversial sentence pairs. For each controversial sentence pair, two language models disagree about which sentence is more likely to occur in natural text. Considering nine language models (including n-gram, recurrent neural networks, and transformer models), we created hundreds of such controversial sentence pairs by either selecting sentences from a corpus or synthetically optimizing sentence pairs to be highly controversial. Human subjects then provided judgments indicating for each pair which of the two sentences is more likely. Controversial sentence pairs proved highly effective at revealing model failures and identifying models that aligned most closely with human judgments. The most human-consistent model tested was GPT-2, although experiments also revealed significant shortcomings of its alignment with human perception.

READ FULL TEXT

page 22

page 23

page 24

page 25

research
02/23/2023

Sentence Simplification via Large Language Models

Sentence Simplification aims to rephrase complex sentences into simpler ...
research
06/02/2023

Is Model Attention Aligned with Human Attention? An Empirical Study on Large Language Models for Code Generation

Large Language Models (LLMs) have been demonstrated effective for code g...
research
12/02/2022

Event knowledge in large language models: the gap between the impossible and the unlikely

People constantly use language to learn about the world. Computational l...
research
12/16/2022

'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

Language Models appear to perform poorly on quantification. We ask how b...
research
05/15/2023

Sentence Level Curriculum Learning for Improved Neural Conversational Models

Designing machine intelligence to converse with a human user necessarily...
research
05/25/2023

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Large language models (large LMs) are susceptible to producing text with...
research
12/20/2022

Measure More, Question More: Experimental Studies on Transformer-based Language Models and Complement Coercion

Transformer-based language models have shown strong performance on an ar...

Please sign up or login with your details

Forgot password? Click here to reset