Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

04/27/2021
by   Artidoro Pagnoni, et al.
4

Modern summarization models generate highly fluent but often factually unreliable outputs. This motivated a surge of metrics attempting to measure the factuality of automatically generated summaries. Due to the lack of common benchmarks, these metrics cannot be compared. Moreover, all these methods treat factuality as a binary concept and fail to provide deeper insights into the kinds of inconsistencies made by different systems. To address these limitations, we devise a typology of factual errors and use it to collect human annotations of generated summaries from state-of-the-art summarization systems for the CNN/DM and XSum datasets. Through these annotations, we identify the proportion of different categories of factual errors in various summarization models and benchmark factuality metrics, showing their correlation with human judgment as well as their specific strengths and weaknesses.

READ FULL TEXT

page 13

page 16

page 17

page 18

research
12/20/2022

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

The proliferation of automatic faithfulness metrics for summarization ha...
research
06/21/2021

How well do you know your summarization datasets?

State-of-the-art summarization systems are trained and evaluated on mass...
research
05/25/2022

Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

The propensity of abstractive summarization systems to make factual erro...
research
04/09/2021

Annotating and Modeling Fine-grained Factuality in Summarization

Recent pre-trained abstractive summarization systems have started to ach...
research
10/17/2020

Factual Error Correction for Abstractive Summarization Models

Neural abstractive summarization systems have achieved promising progres...
research
05/21/2021

Uncertainty-Aware Abstractive Summarization

We propose a novel approach to summarization based on Bayesian deep lear...
research
12/19/2022

Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences

Lack of factual correctness is an issue that still plagues state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset