Quantifying Uncertainty in Answers from any Language Model via Intrinsic and Extrinsic Confidence Assessment

08/30/2023
by   Jiuhai Chen, et al.
0

We introduce BSDetector, a method for detecting bad and speculative answers from a pretrained Large Language Model by estimating a numeric confidence score for any output it generated. Our uncertainty quantification technique works for any LLM accessible only via a black-box API, and combines intrinsic and extrinsic assessments of confidence into a single trustworthiness estimate for any LLM response to a given prompt. Our method is extremely general and can applied to all of the best LLMs available today (whose training data remains unknown). By expending a bit of extra computation, users of any LLM API can now get the same response as they would ordinarily, as well as a confidence estimate that caution when not to trust this response. Experiments on both closed and open-form Question-Answer benchmarks reveal that BSDetector more accurately identifies incorrect LLM responses than alternative uncertainty estimation procedures (for both GPT-3 and ChatGPT). By sampling multiple responses from the LLM and considering the one with the highest confidence score, we can additionally obtain more accurate responses from the same LLM, without any extra training steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models

Large language models (LLMs) specializing in natural language generation...
research
11/03/2022

Uncertainty Quantification for Rule-Based Models

Rule-based classification models described in the language of logic dire...
research
02/02/2023

Creating a Large Language Model of a Philosopher

Can large language models be trained to produce philosophical texts that...
research
08/03/2020

Uncertainty Quantification of Structural Systems with Subset of Data

Quantification of the impact of uncertainty in material properties as we...
research
10/24/2019

Accurate Layerwise Interpretable Competence Estimation

Estimating machine learning performance 'in the wild' is both an importa...
research
06/10/2022

Putting GPT-3's Creativity to the (Alternative Uses) Test

AI large language models have (co-)produced amazing written works from n...
research
09/07/2023

Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

Open Information Extraction (OIE) task aims at extracting structured fac...

Please sign up or login with your details

Forgot password? Click here to reset