Learning to Revise References for Faithful Summarization

04/13/2022
by   Griffin Adams, et al.
0

In many real-world scenarios with naturally occurring datasets, reference summaries are noisy and contain information that cannot be inferred from the source text. On large news corpora, removing low quality samples has been shown to reduce model hallucinations. Yet, this method is largely untested for smaller, noisier corpora. To improve reference quality while retaining all data, we propose a new approach: to revise–not remove–unsupported reference content. Without ground-truth supervision, we construct synthetic unsupported alternatives to supported sentences and use contrastive learning to discourage/encourage (un)faithful revisions. At inference, we vary style codes to over-generate revisions of unsupported reference sentences and select a final revision which balances faithfulness and abstraction. We extract a small corpus from a noisy source–the Electronic Health Record (EHR)–for the task of summarizing a hospital admission from multiple notes. Training models on original, filtered, and revised references, we find (1) learning from revised references reduces the hallucination rate substantially more than filtering (18.4% vs 3.8%), (2) learning from abstractive (vs extractive) revisions improves coherence, relevance, and faithfulness, (3) beyond redress of noisy data, the revision task has standalone value for the task: as a pre-training objective and as a post-hoc editor.

READ FULL TEXT

page 4

page 5

page 17

page 19

page 20

research
05/23/2023

On Learning to Summarize with Large Language Models as References

Recent studies have found that summaries generated by large language mod...
research
10/06/2020

Semantically Driven Sentence Fusion: Modeling and Evaluation

Sentence fusion is the task of joining related sentences into coherent t...
research
04/17/2021

Transductive Learning for Abstractive News Summarization

Pre-trained language models have recently advanced abstractive summariza...
research
05/31/2022

NEWTS: A Corpus for News Topic-Focused Summarization

Text summarization models are approaching human levels of fidelity. Exis...
research
03/09/2023

Longitudinal Assessment of Reference Quality on Wikipedia

Wikipedia plays a crucial role in the integrity of the Web. This work an...
research
03/07/2020

Multi-task Learning Based Neural Bridging Reference Resolution

We propose a multi task learning-based neural model for bridging referen...
research
07/15/2020

Combining Task Predictors via Enhancing Joint Predictability

Predictor combination aims to improve a (target) predictor of a learning...

Please sign up or login with your details

Forgot password? Click here to reset