An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

01/23/2022
by   Darren Abramson, et al.
0

Language models built using semi-supervised machine learning on large corpora of natural language have very quickly enveloped the fields of natural language generation and understanding. In this paper we apply a zero-shot approach independently developed by a number of researchers now gaining recognition as a significant alternative to fine-tuning for evaluation on common sense tasks. A language model with relatively few parameters and training steps compared to a more recent language model (T5) can outperform it on a recent large data set (TimeDial), while displaying robustness in its performance across a similar class of language tasks. Surprisingly, this result is achieved by using a hyperparameter-free zero-shot method with the smaller model, compared to fine-tuning to the larger model. We argue that robustness of the smaller model ought to be understood in terms of compositionality, in a sense that we draw from recent literature on a class of similar models. We identify a practical cost for our method and model: high GPU-time for natural language evaluation. The zero-shot measurement technique that produces remarkable stability, both for ALBERT and other BERT variants, is an application of pseudo-log-likelihoods to masked language models for the relative measurement of probability for substitution alternatives in forced choice language tasks such as the Winograd Schema Challenge, Winogrande, and others. One contribution of this paper is to bring together a number of similar, but independent strands of research. We produce some absolute state-of-the-art results for common sense reasoning in binary choice tasks, performing better than any published result in the literature, including fine-tuned efforts. We show a remarkable consistency of the model's performance under adversarial settings, which we argue is best explained by the model's compositionality of representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

Pretrained language models (PLMs) have demonstrated remarkable performan...
research
05/11/2022

Clinical Prompt Learning with Frozen Language Models

Prompt learning is a new paradigm in the Natural Language Processing (NL...
research
12/02/2022

Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Large language models that are capable of zero or few-shot prompting app...
research
08/11/2023

Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling

Generative language models (LMs) have become omnipresent across data sci...
research
05/15/2019

A Surprisingly Robust Trick for Winograd Schema Challenge

The Winograd Schema Challenge (WSC) dataset WSC273 and its inference cou...
research
03/24/2021

Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

Thinking aloud is an effective meta-cognitive strategy human reasoners a...
research
05/23/2023

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

We introduce Goat, a fine-tuned LLaMA model that significantly outperfor...

Please sign up or login with your details

Forgot password? Click here to reset