Asking and Answering Questions to Evaluate the Factual Consistency of Summaries

04/08/2020
by   Alex Wang, et al.
0

Practical applications of abstractive summarization models are limited by frequent factual inconsistencies with respect to their input. Existing automatic evaluation metrics for summarization are largely insensitive to such errors. We propose an automatic evaluation protocol called QAGS (pronounced "kags") that is designed to identify factual inconsistencies in a generated summary. QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source. To evaluate QAGS, we collect human judgments of factual consistency on model-generated summaries for the CNN/DailyMail (Hermann et al., 2015) and XSUM (Narayan et al., 2018) summarization datasets. QAGS has substantially higher correlations with these judgments than other automatic evaluation metrics. Also, QAGS offers a natural form of interpretability: The answers and questions generated while computing QAGS indicate which tokens of a summary are inconsistent and why. We believe QAGS is a promising tool in automatically generating usable and factually consistent text.

READ FULL TEXT
research
05/04/2022

Masked Summarization to Generate Factually Inconsistent Summaries for Improved Factual Consistency Checking

Despite the recent advances in abstractive summarization systems, it is ...
research
07/24/2020

SummEval: Re-evaluating Summarization Evaluation

The scarcity of comprehensive up-to-date studies on evaluation metrics f...
research
12/20/2022

BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics

The proliferation of automatic faithfulness metrics for summarization ha...
research
07/11/2022

SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder

Text summarization models are often trained to produce summaries that me...
research
10/28/2019

Evaluating the Factual Consistency of Abstractive Text Summarization

Currently used metrics for assessing summarization algorithms do not acc...
research
03/22/2022

Towards Abstractive Grounded Summarization of Podcast Transcripts

Podcasts have recently shown a rapid rise in popularity. Summarization o...
research
01/28/2023

MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization

State-of-the-art summarization systems can generate highly fluent summar...

Please sign up or login with your details

Forgot password? Click here to reset