With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness

05/26/2023
by   Julius Steen, et al.
5

Conditional language models still generate unfaithful output that is not supported by their input. These unfaithful generations jeopardize trust in real-world applications such as summarization or human-machine interaction, motivating a need for automatic faithfulness metrics. To implement such metrics, NLI models seem attractive, since they solve a strongly related task that comes with a wealth of prior research and data. But recent research suggests that NLI models require costly additional machinery to perform reliably across datasets, e.g., by running inference on a cartesian product of input and generated sentences, or supporting them with a question-generation/answering step. In this work we show that pure NLI models _can_ outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures. We propose: (1) Augmenting NLI training data to adapt NL inferences to the specificities of faithfulness prediction in dialogue; (2) Making use of both entailment and contradiction probabilities in NLI, and (3) Using Monte-Carlo dropout during inference. Applied to the TRUE benchmark, which combines faithfulness datasets across diverse domains and tasks, our approach strongly improves a vanilla NLI model and significantly outperforms previous work, while showing favourable computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2022

Post-Training Dialogue Summarization using Pseudo-Paraphrasing

Previous dialogue summarization techniques adapt large language models p...
research
12/06/2022

Improved Beam Search for Hallucination Mitigation in Abstractive Summarization

Advancement in large pretrained language models has significantly improv...
research
05/23/2023

USB: A Unified Summarization Benchmark Across Tasks and Domains

An abundance of datasets exist for training and evaluating models on the...
research
06/05/2023

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

Even after fine-tuning and reinforcement learning, large language models...
research
04/07/2023

Why think step-by-step? Reasoning emerges from the locality of experience

Humans have a powerful and mysterious capacity to reason. By working thr...
research
10/14/2021

MoFE: Mixture of Factual Experts for Controlling Hallucinations in Abstractive Summarization

Neural abstractive summarization models are susceptible to generating fa...
research
04/11/2022

TRUE: Re-evaluating Factual Consistency Evaluation

Grounded text generation systems often generate text that contains factu...

Please sign up or login with your details

Forgot password? Click here to reset