Go Figure! A Meta Evaluation of Factuality in Summarization

10/24/2020
by   Saadia Gabriel, et al.
0

Text generation models can generate factually inconsistent text containing distorted or fabricated facts about the source text. Recent work has focused on building evaluation models to verify the factual correctness of semantically constrained text generation tasks such as document summarization. While the field of factuality evaluation is growing fast, we don't have well-defined criteria for measuring the effectiveness, generalizability, reliability, or sensitivity of the factuality metrics. Focusing on these aspects, in this paper, we introduce a meta-evaluation framework for evaluating factual consistency metrics. We introduce five necessary, common-sense conditions for effective factuality metrics and experiment with nine recent factuality metrics using synthetic and human-labeled factuality data from short news, long news and dialogue summarization domains. Our framework enables assessing the efficiency of any new factual consistency metric on a variety of dimensions over multiple summarization domains and can be easily extended with new meta-evaluation criteria. We also present our conclusions towards standardizing the factuality evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2020

Re-evaluating Evaluation in Text Summarization

Automated evaluation metrics as a stand-in for manual evaluation are an ...
research
05/23/2023

APPLS: A Meta-evaluation Testbed for Plain Language Summarization

While there has been significant development of models for Plain Languag...
research
04/11/2022

TRUE: Re-evaluating Factual Consistency Evaluation

Grounded text generation systems often generate text that contains factu...
research
06/20/2023

Open-Domain Text Evaluation via Meta Distribution Modeling

Recent advances in open-domain text generation models powered by large p...
research
08/01/2022

SMART: Sentences as Basic Units for Text Evaluation

Widely used evaluation metrics for text generation either do not work we...
research
09/04/2023

NumHG: A Dataset for Number-Focused Headline Generation

Headline generation, a key task in abstractive summarization, strives to...
research
05/23/2021

Controlling Text Edition by Changing Answers of Specific Questions

In this paper, we introduce the new task of controllable text edition, i...

Please sign up or login with your details

Forgot password? Click here to reset