Mutual Information Alleviates Hallucinations in Abstractive Summarization

10/24/2022
by   Liam van der Poel, et al.
0

Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the tendency to hallucinate, i.e., output content not supported by the source document. A number of works have tried to fix–or at least uncover the source of–the problem with limited success. In this paper, we identify a simple criterion under which models are significantly more likely to assign more probability to hallucinated content during generation: high model uncertainty. This finding offers a potential explanation for hallucinations: models default to favoring text with high marginal probability, i.e., high-frequency occurrences in the training set, when uncertain about a continuation. It also motivates possible routes for real-time intervention during decoding to prevent such hallucinations. We propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token–rather than purely the probability of the target token–when the model exhibits uncertainty. Experiments on the XSum dataset show that our method decreases the probability of hallucinated tokens while maintaining the Rouge and BertS scores of top-performing decoding strategies.

READ FULL TEXT
research
10/22/2021

Lightweight Decoding Strategies for Increasing Specificity

Language models are known to produce vague and generic outputs. We propo...
research
02/16/2023

Learning with Rejection for Abstractive Text Summarization

State-of-the-art abstractive summarization systems frequently hallucinat...
research
03/06/2022

Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Token-level adaptive training approaches can alleviate the token imbalan...
research
10/15/2020

Understanding Neural Abstractive Summarization Models via Uncertainty

An advantage of seq2seq abstractive summarization models is that they ge...
research
01/04/2016

Mutual Information and Diverse Decoding Improve Neural Machine Translation

Sequence-to-sequence neural translation models learn semantic and syntac...
research
12/22/2017

Source-side Prediction for Neural Headline Generation

The encoder-decoder model is widely used in natural language generation ...
research
09/30/2022

Calibrating Sequence likelihood Improves Conditional Language Generation

Conditional language models are predominantly trained with maximum likel...

Please sign up or login with your details

Forgot password? Click here to reset