Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics

04/27/2021
by   Artidoro Pagnoni, et al.
4

Modern summarization models generate highly fluent but often factually unreliable outputs. This motivated a surge of metrics attempting to measure the factuality of automatically generated summaries. Due to the lack of common benchmarks, these metrics cannot be compared. Moreover, all these methods treat factuality as a binary concept and fail to provide deeper insights into the kinds of inconsistencies made by different systems. To address these limitations, we devise a typology of factual errors and use it to collect human annotations of generated summaries from state-of-the-art summarization systems for the CNN/DM and XSum datasets. Through these annotations, we identify the proportion of different categories of factual errors in various summarization models and benchmark factuality metrics, showing their correlation with human judgment as well as their specific strengths and weaknesses.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

page 16

page 17

page 18

07/24/2020

SummEval: Re-evaluating Summarization Evaluation

The scarcity of comprehensive up-to-date studies on evaluation metrics f...
06/21/2021

How well do you know your summarization datasets?

State-of-the-art summarization systems are trained and evaluated on mass...
04/09/2021

Annotating and Modeling Fine-grained Factuality in Summarization

Recent pre-trained abstractive summarization systems have started to ach...
09/19/2021

CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization

We study generating abstractive summaries that are faithful and factuall...
10/17/2020

Factual Error Correction for Abstractive Summarization Models

Neural abstractive summarization systems have achieved promising progres...
05/21/2021

Uncertainty-Aware Abstractive Summarization

We propose a novel approach to summarization based on Bayesian deep lear...
01/29/2021

Fairness for Whom? Understanding the Reader's Perception of Fairness in Text Summarization

With the surge in user-generated textual information, there has been a r...

Code Repositories

frank

FRANK: Factuality Evaluation Benchmark


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.