Correcting Diverse Factual Errors in Abstractive Summarization via Post-Editing and Language Model Infilling
Abstractive summarization models often generate inconsistent summaries containing factual errors or hallucinated content. Recent works focus on correcting factual errors in generated summaries via post-editing. Such correction models are trained using adversarial non-factual summaries constructed using heuristic rules for injecting errors. However, generating non-factual summaries using heuristics often does not generalize well to actual model errors. In this work, we propose to generate hard, representative synthetic examples of non-factual summaries through infilling language models. With this data, we train a more robust fact-correction model to post-edit the summaries to improve factual consistency. Through quantitative and qualitative experiments on two popular summarization datasets – CNN/DM and XSum – we show that our approach vastly outperforms prior methods in correcting erroneous summaries. Our model – FactEdit – improves factuality scores by over 11 points on CNN/DM and over 31 points on XSum on average across multiple summarization models, producing more factual summaries while maintaining competitive summarization quality.
READ FULL TEXT