Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

05/25/2023
by   Niels Mündler, et al.
0

Large language models (large LMs) are susceptible to producing text with hallucinated content. Self-contradiction, where the LM generates two contradictory sentences within the same context, is an important form of hallucination. In this work, we present a comprehensive analysis on self-contradiction for state-of-the-art, instruction-tuned LMs, including evaluation, detection, and mitigation. To effectively trigger self-contradictions, we design a framework that constrains LMs to generate appropriate sentence pairs. Our evaluation on these sentence pairs reveals that self-contradictions occur frequently across different LMs for both famous and lesser-known topics. Next, we prompt the LMs to detect self-contradictions. Our results indicate that ChatGPT and GPT-4 are able to accurately identify self-contradictions, while Vicuna-13B struggles to do so. For example, with our best prompting method, ChatGPT achieves 91.0 sentence pairs generated by itself. To automatically mitigate self-contradictions, we develop an iterative algorithm that prompts the LMs to remove the detected self-contradictions from the generated text. Our algorithm successfully revises the text such that self-contradictions are significantly reduced, while maintaining its fluency and informativeness. Importantly, our entire pipeline of triggering, detecting, and mitigating self-contradictions is applicable to black-box LMs and does not require any external grounded knowledge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2023

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

Recently developed large language models have achieved remarkable succes...
research
12/16/2022

Self-Prompting Large Language Models for Open-Domain QA

Open-Domain Question Answering (ODQA) requires models to answer factoid ...
research
04/07/2022

Testing the limits of natural language models for predicting human language judgments

Neural network language models can serve as computational hypotheses abo...
research
09/06/2023

Zero-Resource Hallucination Prevention for Large Language Models

The prevalent use of large language models (LLMs) in various domains has...
research
02/28/2021

Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

When trained on large, unfiltered crawls from the internet, language mod...
research
05/24/2023

Ghostbuster: Detecting Text Ghostwritten by Large Language Models

We introduce Ghostbuster, a state-of-the-art system for detecting AI-gen...
research
03/15/2023

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Generative Large Language Models (LLMs) such as GPT-3 are capable of gen...

Please sign up or login with your details

Forgot password? Click here to reset