Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

08/22/2023
by   Noel Ngu, et al.
0

Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be employed. We perform a suite of experiments on multiple datasets and temperature settings to demonstrate that these measures strongly correlate with the probability of failure. Additionally, we present empirical results demonstrating how these measures can be applied to few-shot prompting, chain-of-thought reasoning, and error detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2023

Large Language Models Perform Diagnostic Reasoning

We explore the extension of chain-of-thought (CoT) prompting to medical ...
research
02/23/2023

An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)

We study the performance of a commercially available large language mode...
research
06/08/2023

Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization

Large language models (LLMs)have achieved great success in general domai...
research
01/27/2023

ThoughtSource: A central hub for large language model reasoning data

Large language models (LLMs) such as GPT-3 and ChatGPT have recently dem...
research
05/24/2023

Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models

The performance on Large Language Models (LLMs) on existing reasoning be...
research
05/28/2023

Rethinking Masked Language Modeling for Chinese Spelling Correction

In this paper, we study Chinese Spelling Correction (CSC) as a joint dec...
research
06/26/2023

Composing Parameter-Efficient Modules with Arithmetic Operations

As an efficient alternative to conventional full finetuning, parameter-e...

Please sign up or login with your details

Forgot password? Click here to reset