Go Figure! A Meta Evaluation of Factuality in Summarization

by   Saadia Gabriel, et al.

Text generation models can generate factually inconsistent text containing distorted or fabricated facts about the source text. Recent work has focused on building evaluation models to verify the factual correctness of semantically constrained text generation tasks such as document summarization. While the field of factuality evaluation is growing fast, we don't have well-defined criteria for measuring the effectiveness, generalizability, reliability, or sensitivity of the factuality metrics. Focusing on these aspects, in this paper, we introduce a meta-evaluation framework for evaluating factual consistency metrics. We introduce five necessary, common-sense conditions for effective factuality metrics and experiment with nine recent factuality metrics using synthetic and human-labeled factuality data from short news, long news and dialogue summarization domains. Our framework enables assessing the efficiency of any new factual consistency metric on a variety of dimensions over multiple summarization domains and can be easily extended with new meta-evaluation criteria. We also present our conclusions towards standardizing the factuality evaluation metrics.


page 1

page 2

page 3

page 4


Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an ...

TRUE: Re-evaluating Factual Consistency Evaluation

Grounded text generation systems often generate text that contains factu...

SMART: Sentences as Basic Units for Text Evaluation

Widely used evaluation metrics for text generation either do not work we...

Controlling Text Edition by Changing Answers of Specific Questions

In this paper, we introduce the new task of controllable text edition, i...

SummVis: Interactive Visual Analysis of Models, Data, and Evaluation for Text Summarization

Novel neural architectures, training strategies, and the availability of...

MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification

In text summarization and simplification, system outputs must be evaluat...

Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study

Using data-driven models for solving text summarization or similar tasks...